红帽服务器reboot操作系统重启失败卡在服务启动阶段

redhat服务器重启reboot后卡在服务启动界面,/var/log/message中有类似信息:
Nov 16 16:40:02 localhost kernel: INFO: task ifup-eth:3793 blocked for more than 120 seconds.
Nov 16 16:40:02 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 16 16:40:02 localhost kernel: ifup-eth D 0000000000000001 0 3793 3521 0x00000000
Nov 16 16:40:02 localhost kernel: ffff88404a4bbce8 0000000000000082 0000000000000000 ffff8840500c9500
Nov 16 16:40:02 localhost kernel: ffff8840500c9538 0000000000000000 ffff88404a4bbca8 ffffffff81064a00
Nov 16 16:40:02 localhost kernel: ffff8840500c9ab8 ffff88404a4bbfd8 000000000000fb88 ffff8840500c9ab8
Nov 16 16:40:02 localhost kernel: Call Trace:
Nov 16 16:40:02 localhost kernel: [<ffffffff81064a00>] ? pick_next_task_fair+0xd0/0x130
Nov 16 16:40:02 localhost kernel: [<ffffffff8150d7ff>] ? thread_return+0x16d/0x76e
Nov 16 16:40:02 localhost kernel: [<ffffffff8150e555>] schedule_timeout+0x215/0x2e0
Nov 16 16:40:02 localhost kernel: [<ffffffff810669eb>] ? enqueue_rt_entity+0x6b/0x80
Nov 16 16:40:02 localhost kernel: [<ffffffff8150e1d3>] wait_for_common+0x123/0x180
Nov 16 16:40:02 localhost kernel: [<ffffffff81063310>] ? default_wake_function+0x0/0x20
Nov 16 16:40:02 localhost kernel: [<ffffffff8150e2ed>] wait_for_completion+0x1d/0x20
Nov 16 16:40:02 localhost kernel: [<ffffffff8106513c>] sched_exec+0xdc/0xe0
Nov 16 16:40:02 localhost kernel: [<ffffffff81189fc0>] do_execve+0xe0/0x2c0
Nov 16 16:40:02 localhost kernel: [<ffffffff810095ea>] sys_execve+0x4a/0x80
Nov 16 16:40:02 localhost kernel: [<ffffffff8100b4ca>] stub_execve+0x6a/0xc0
卡住的服务并不确定是哪个。


2 回答

grander
admin赞同了此回答
这是一个已知的BUG, 请见http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/xeon-e5-family-spec-update.pdf里面的BT81,摘抄如下:
BT81. TSC is Not Affected by Warm Reset
Problem : The TSC (Time Stamp Counter MSR 10H) should be cleared on reset. Due to this erratum the TSC is not affected by warm reset.
Implication : The TSC is not cleared by a warm reset. The TSC is cleared by power-on reset as expected. Intel has not observed any functional failures due to this erratum.
该问题为linux内核的一个bug,满足如下条件可能触发:
1)操作系统为Red Hat Enterprise Linux 6.1 6.4。(6.5及以上没问题)
2CPU属于Intel® Xeon® E5, Intel® Xeon® E5 v2, Intel® Xeon® E7 v2 系列。
3)大约200天以上没有断电重启过。

2018-9-6 14:26
grander
admin赞同了此回答
强制断电后再启动可以正常启动临时解决或升级操作系统。
2018-9-6 14:27

撰写回答

您需要登录后才可以回帖 登录 | 立即注册

提问者

发布146
回答382

相关问题

相关资料