注:下文将根据本人的血泪实践,逐步完善!

因为在更新的时候断电,系统文件不完整,导致系统启动失败。经过一翻折腾,终于修复。这个过程中,有点心得,收录如下:

用arch遇到的问题:

一、一般起因:

  1. 升级升挂了
  2. 突然断电丢系统文件
  3. 自己编译没编译好
  4. 病毒?

二、严重程度:

  1. 极其严重:用户的文件丢失
  2. 特别严重:分区坏了
  3. 严重
    1. 引导坏了
    2. 内核坏了
    3. 某个需要启动的服务坏了
    4. x进不了
    5. x进去,但错误严重

三、对解决问题有帮助的:

  1. 每个软件包所产生的文件清单——arch有,需要怎么查找?
  2. 备份,最好是使用有类似版本管理工具的文件系统。
  3. 联网,网络是个巨大的教科书,可以查看问题解决办法。

四、系统急救级别应对:

  1. 直接硬盘数据恢复:最严重
  2. 用别的介质启动:特别严重
  3. 在启动引导器的选项后面加 init=bin/sh rw 启动shell:
  4. systemd emergency :
  5. systemd rescue :
  6. 切换其他终端
  7. 登陆器
  8. 桌面环境
  9. 软件界面
  10. 等等~

下面只收集最严重的几种情况解决法,其他情况很容易解决。

五、问题的解决法

1. 别的介质启动:

进入系统后可挂载,修改配置,移入文件,删除文件等操作……————————系统安装就是这种方式

一、准备工作:任何一张linux系统安装光盘/可启动优盘/闪盘/移动硬盘等,最好工具全面的,可以图形界面,也可以直接命令行界面。本人常用 slax(图形界面,可自己安装东西) 和 SystemRescuedCd,基于gentoo的,命令行。

二、过程:

1、进入安装系统

2、建立挂载目录,挂载所需要的硬盘

mkdir /mnt/xxx
mount /dev/sdaX /mnt/xxx

3、篡权:切换到所需急救的主宰权限, 然后,挂载其他所需,比如:

cd /mnt/arch
mount -t proc proc proc/
mount -t sysfs sys sys/
mount -o bind /dev dev/
mount -t devpts pts dev/pts/

为在篡权环境中使用网络,需要做下面这步:

cp -L /etc/resolv.conf etc/resolv.conf

archlinux比较方便,直接用arch-chroot /mnt/xxx即可搞定上面几步。

chroot /mnt/xxx

4、做所需的活

比如启动网络

dhcpd eth0

比如重建引导用镜像

mkinitcpio -p linux

5、退出(还权、让权)

exit

6、卸下挂载的硬盘

umount /mnt/xxx

7、重启

reboot

2. 最简终端急救:

下面是例子:

不启动图形界面:运行级别改为3即可,方法是在引导管理器里修改,比如在grub中,添加3到root=/dev/xxx那行。

boot: rescue root=/dev/hda2 3

只启动系统核心,进入最简终端急救:需要在引导管理器里把init改为如下:

LILO: Linux init=/bin/sh rw

此时可做的事: 修改root密码:

sh# passwd

挂载根目录:

sh# mount -o remount,rw /         可读写方式

sh# mount -o remount,ro /         只读方式

The reason for such “drastic” action is that otherwise, e.g. if you typed reboot at the prompt, your system would try to stop myriad of services that are not actually running when you have booted with the above parameters to the kernel. So the reboot would take longer (it could even get stuck at some point) and the output of all those failed init.d scripts is definitely not for everybody’s eyes. :)

开机问题

If your machine gets stuck during boot, first check if the hang happens before or after control passes to systemd. Try to boot without rhgb and quiet on the kernel command line. If you see some messages like these:

``` Welcome to Fedora VERSION (codename)!”

Starting name…

[ OK ] Started name.

then systemd is running. (See an actual screenshot.) ```

Debugging always gets easier if you can get a shell. If you do not get a login prompt, try switching to a different virtual terminal using CTRL+ALT+F. Problems with X server startup may manifest themselves as a missing login on tty1, but other VTs working.

If the boot stops without presenting you with a login on any virtual console, let it retry for up to 5 minutes before declaring it definitely stuck. There is a chance that a service that has trouble starting will be killed after this timeout and the boot will continue normally. Another possibility is that a device for an important mountpoint will fail to appear and you will be presented with emergency mode.

若无法 进入 到登录界面或者也没进入 急救shell(emergency mode shell),则组合键 CTRL+ALT+DEL 重启

若无法重启,使用 SysRq或者机箱按重启键强制重启。

重启时,进去 串口终端(Serial Console)

If you have a hardware serial console available or if you are debugging in a virtual machine (e.g. using virt-manager you can switch your view to a serial console in the menu View -> Text Consoles), you can ask systemd to log lots of useful debugging information to it by booting with:

systemd.log_level=debug systemd.log_target=console console=ttyS0,38400

进入 救护模式(rescue):可在引导器在kernel行 添加 systemd.unit=rescue.target 或者 最后 加 1,此模式用于基本系统能正常启动而其他后面的无法启动时。

进入后可根据情况,开关其他 服务。若仍然无效,则需要进入 急救模式(emergency),启动时,在kernel行 添加 systemd.unit=emergency.target 或者 emergency,此模式需要手动 挂载根分区为可读写,如下:

mount -o remount,rw /

看看挂载情况是否正常,/etc/fstab,若正常,可运行 systemctl daemon-reload刷新。

若 急救模式也无效,可直接在 kernel行添加 init=/bin/sh,此步一般用于 systemd本身损坏,或者其他基本的库损毁,解决办法一般是重新安装这些软件和库。

init=/bin/sh仍然不起作用,则需要使用其他介质启动了。

Early Debug Shell

You can enable shell access to be available very early in the startup process to fall back on and diagnose systemd related boot up issues with various systemctl commands. Enable it using: systemctl enable debug-shell.service

Tip: If your version of systemd is not new enough to have debug-shell.service, you can download the unit file from systemd git. Substitute /bin/bash for @sushell@ in the file.

Tip: If you find yourself in a situation where you cannot use systemctl (e.g. when setting this up from a different booted system), you can enable the service manually:

cd $PATH_TO_YOUR_ROOT_FS/etc/systemd/system
mkdir -p sysinit.target.wants
ln -s /lib/systemd/system/debug-shell.service sysinit.target.wants/

Once enabled, the next time you boot you will be able to switch to tty9 using CTRL+ALT+F9 and have a root shell there available from an early point in the booting process. You can use the shell for checking the status of services, reading logs, looking for stuck jobs with systemctl list-jobs, etc.

Warning: Use this shell only for debugging! Do not forget to disable systemd-debug-shell.service after you’ve finished debugging your boot problems. Leaving the root shell always available would be a security risk.

If You Can Get a Shell

When you have systemd running to the extent that it can provide you with a shell, please use it to extract useful information for debugging. Boot with these parameters on the kernel command line:

systemd.log_level=debug systemd.log_target=kmsg log_buf_len=1M

in order to increase the verbosity of systemd, to let systemd write its logs to the kernel log buffer, and to increase the size of the kernel log buffer. After reaching the shell, save the log:

dmesg > dmesg.txt

When reporting a bug, attach the dmesg.txt file.

To check for possibly stuck jobs use:

systemctl list-jobs

The jobs that are listed as “running” are the ones that must complete before the “waiting” ones will be allowed to start executing.

关机问题

Just like with boot problems, when you encounter a hang during shutting down, make sure you wait at least 5 minutes to distinguish a permanent hang from a broken service that’s just timing out. Then it’s worth testing whether the system reacts to CTRL+ALT+DEL in any way.

If shutdown (whether it be to reboot or power-off) of your system gets stuck, first test if the kernel itself is able to reboot or power-off the machine forcedly using one of these commands:

sync && reboot -f
sync && poweroff -f

若这两条命令都不起作用,是核心的问题,不是systemd的问题。

关机时间较长

若重启和关机都可用,但就是时间较长,可在启动时加入 调试功能:

systemd.log_level=debug systemd.log_target=kmsg log_buf_len=1M enforcing=0

把下面的代码保存为 /lib/systemd/system-shutdown/debug.sh,并更改为 可执行。 #!/bin/sh mount -o remount,rw / dmesg > /shutdown-log.txt mount -o remount,ro / reboot

Look for timeouts logged in the resulting file shutdown-log.txt and/or attach it to a bugreport.

机器关不掉

If normal reboot or poweroff never finish even after waiting a few minutes, the above method to create the shutdown log will not help and the log must be obtained using other methods. Two options that are useful for debugging boot problems can be used also for shutdown problems:

use a serial console

use a debug shell - not only is it available from early boot, it also stays active until late shutdown.

Status and Logs of Services

When the start of a service fails, systemctl will give you a generic error message:

# systemctl start foo.service

Job failed. See system journal and ‘systemctl status’ for details.

The service may have printed its own error message, but you do not see it, because services run by systemd are not related to your login session and their outputs are not connected to your terminal. That does not mean the output is lost though. By default the stdout, stderr of services are directed to the systemd journal and the logs that services produce via syslog(3) go there too. systemd also stores the exit code of failed services. Let’s check:

``` # systemctl status foo.service foo.service - mmm service Loaded: loaded (/etc/systemd/system/foo.service; static) Active: failed (Result: exit-code) since Fri, 11 May 2012 20:26:23 +0200; 4s ago Process: 1329 ExecStart=/usr/local/bin/foo (code=exited, status=1/FAILURE) CGroup: name=systemd:/system/foo.service

May 11 20:26:23 scratch foo[1329]: Failed to parse config ```

示例中,服务是运行在 PID 1329 进程里,但出错退出,出错状态是 1。此处列出的信息非常有限,要更完整的信息,可用 journalctl

若已经运行有 系统日志服务(比如 rsyslog),the journal 也会把信息发送给它,可在 /var/log/messages(具体在哪个位置,根据rsyslog设置)中查看。

Reporting systemd Bugs

Be prepared to include some information (logs) about your system as well. These should be complete (no snippets please), not in an archive, uncompressed, with MIME type set as text/plain.

Please report bugs to your distribution’s bug tracker first. If you are sure that you are encountering an upstream bug, then first check for existing bug reports, and if your issue is not listed file a new bug.

Information to Attach to a Bug Report

Whenever possible, the following should be mentioned and attached to your bug report:

The exact kernel command-line used if not default. Typically from the bootloader configuration file (e.g. /boot/grub2/grub.cfg) or from /proc/cmdline

A copy of the file /var/log/messages

The output of the dmesg command: dmesg > dmesg.txt

ideally after booting with systemd.log_level=debug systemd.log_target=kmsg log_buf_len=1M

The output of a systemd dump: systemctl dump > systemd-dump.txt The output of /usr/bin/systemd --test --system --log-level=debug > systemd-test.txt 2>&1

常用 systemd命令

查看运行中的工作: systemctl list-jobs,To identify slow boot and look for the jobs that are “running” those jobs are the ones where boot waits for completion on and the ones that listed as “waiting” will be executed only after those which are “running” are completed.

查看所有可用 systemd服务和其运转状态: systemctl list-units -t service --all

查看所有 活跃的服务(services): systemctl list-units -t service

查看某个服务是否 活跃,比如看看ssh服务: systemctl status sshd.service

查看可用的 标的(targets):systemctl list-units -t target --all

查看活跃的标的: systemctl list-units -t target

查看某个标的的依赖关系:systemctl list-dependencies multi-user.target

启动时,可用Systemd的参数

下面是常用的用于 调试启动的参数: