上周在线上修改了一批机器的hostname: >hostnamectl set-hostname xxx

后来有同学反馈2台机器的/etc/resolv.conf 被清空了,resolv.conf 的内容为:

1
# Generated by NetworkManager

完了,第一感觉是这两个事情一定有相关,我存在知识盲区,简单搜索,果然发现:

Bug 1344303 - hostnamectl set-hostname over-writes existing resolv.conf entries

查看出问题的这2台机器的/var/log/message 也存在上述连接类似的日志:

1
2
3
4
5
6
7
8
Jun 22 13:48:46 test NetworkManager[605]: <info>  Setting system hostname to 'test' (from system configuration)
Jun 22 13:48:46 test dbus[610]: [system] Activating via systemd: service name='org.freedesktop.nm_dispatcher' unit='dbus-org.freedesktop.nm-dispatcher.service'
Jun 22 13:48:46 test dbus-daemon[610]: dbus[610]: [system] Activating via systemd: service name='org.freedesktop.nm_dispatcher' unit='dbus-org.freedesktop.nm-dispatcher.service'
Jun 22 13:48:46 test systemd[1]: Starting Network Manager Script Dispatcher Service...
Jun 22 13:48:46 test dbus[610]: [system] Successfully activated service 'org.freedesktop.nm_dispatcher'
Jun 22 13:48:46 test dbus-daemon[610]: dbus[610]: [system] Successfully activated service 'org.freedesktop.nm_dispatcher'
Jun 22 13:48:46 test systemd[1]: Started Network Manager Script Dispatcher Service.
Jun 22 13:48:46 test nm-dispatcher[3006]: Dispatching action 'hostname'

Networkmanager 服务也确实在运行之中,存在相关日志:

1
settings: hostname changed from "" to ""

不过很奇怪,当时总共修改了有几十台服务器,NetworkManager 在运行的也有几十台,其他的好像都没问题,后来为了复现,在这2台机器上再次手动修改hostname,复现不了。。orz,猜测可能和resolv.conf 原来的内容有关,细节可能需要读源码才知道。

Bug 1344303

话说回来,除了感觉自己存在知识盲区,“想当然”之外,这个Bug 实在是有点莫名其妙,就如同这个Bug 中的网友说的:

My point is still, that hostnamectl does not look like you are editing resolv.conf. but I’m with you that there is a relation of the domain name and the FQDN.

I’ll update NetworkManager.conf with main/dns=none - no problem. I just wished I knew earlier howto disable the management of resolv.conf

然而作者的态度也是非常坚持,读起来有点爱用用,不用滚的味道。只能说我作为运维,对交付出来的机器,初始化不够统一,才会遇到这种坑。ok,在初始化脚本中增加(针对CentOS 7.x):

1
2
3
4
5
6
- name: Disable NetworkManager
  systemd:
    name: NetworkManager
    state: stopped
    enabled: false
  ignore_errors: true

感谢网友来信指出: >在 RHEL8 中,已经取消了 network.service,所有的网络配置都归属于 NetworkManager,这里可能不是很适用。

具体可参考:基于RHEL8/CentOS8的网络IP配置详解

似乎Linux 的世界就是有这么多轮子,曾经想,Systemd 能一统天下,带来的却是更多轮子,23333。

夺回控制权

读到一篇文章:How to take back control of /etc/resolv.conf on Linux,如何夺回Linux 上/etc/resolv.conf 的控制权! 原来还有这么多玩意,详情参考文章链接,作者一直在更新,很用心。

  • NetworkManager
  • /etc/sysconfig/network/config: NETCONFIG_DNS_POLICY
  • resolvconf, rdnssd
  • systemd-resolved

对了,DHCP 也可能会碰你的resolv.conf 哦

学无止境,坑外有坑。