沃佳云的环境又异常关机,导致k8s环境崩溃。之前,重启harbor就能正常使用,但是,今天恢复harbor的时候,发现无论怎样都不行。
先启动docker服务:
[root@hdss7-200 ~]# systemctl start docker
重新部署harbor:
[root@hdss7-200 ~]# cd /opt/harbor[root@hdss7-200 harbor]# ./install.shRemoving f3181ac0cf37_harbor-portal ... errorCreating harbor-log ... done "overlay2" failed to remove root filesystem: unlinkat /data/docker/overlay2/39f18cbdc9fdb752772c8b4df19b53b5ca27a51969ba8d1c7870f38a02f02bf7/diff/etc/nginx/fastcgi_temp: input/output errorRemoving network harborv183_harborCreating harbor-core ... Recreating f3181ac0cf37_harbor-portal ... Recreating f3181ac0cf37_harbor-portal ... errorERROR: for f3181ac0cf37_harbor-portal container f3181ac0cf37cd29594dd7fa499b06cee71473c4caf63b5282638513a79b081e: driver "overlay2" failed to remove root filesystem: unlinkat /data/docker/overlay2/39f18cbdc9fdb752772c8b4df19b53b5ca27a51969ba8d1c7870f38a02f02bf7/diff/etc/nginx/fastcgi_temp: input/output errorERROR: for portal container f3181ac0cf37cd29594dd7fa499b06cee71473c4caf63b5282638513a79b081e: driver "overlay2" failed to remove root filesystem: unlinkat /data/docker/overlay2/39f18cbdc9fdb752772c8b4df19b53b5ca27a51969ba8d1c7870f38a02f02bf7/diff/etc/nginx/fastcgi_temp: input/output errorERROR: Encountered errors while bringing up the project.
查看docker进程:
[root@hdss7-200 harbor]# docker ps -aCONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES2991dd383a4d goharbor/harbor-portal:v1.8.3 "nginx -g 'daemon of…" 5 minutes ago Up 5 minutes (healthy) 80/tcp harbor-portal39a3a548010c goharbor/harbor-jobservice:v1.8.3 "/harbor/start.sh" 5 minutes ago Up 5 minutes Υ��,Υ�� harbor-jobservice8e77c46135f8 goharbor/harbor-core:v1.8.3 "/harbor/start.sh" 5 minutes ago Up 5 minutes (healthy) harbor-corea451e99b8c61 goharbor/harbor-db:v1.8.3 "/entrypoint.sh post…" 5 minutes ago Up 5 minutes (healthy) 5432/tcp harbor-db3e572508f475 goharbor/registry-photon:v2.7.1-patch-2819-v1.8.3 "/entrypoint.sh /etc…" 5 minutes ago Up 5 minutes (healthy) 5000/tcp registry473b7beada5a goharbor/redis-photon:v1.8.3 "docker-entrypoint.s…" 5 minutes ago Up 5 minutes 6379/tcp redis160b88a8c778 goharbor/harbor-registryctl:v1.8.3 "/harbor/start.sh" 5 minutes ago Up 5 minutes (healthy) registryctla5d2fc46e05e goharbor/harbor-log:v1.8.3 "/bin/sh -c /usr/loc…" 5 minutes ago Up 5 minutes (healthy) 127.0.0.1:1514->10514/tcp harbor-logf3181ac0cf37 goharbor/harbor-portal:v1.8.3 "nginx -g 'daemon of…" 2 weeks ago Removal In Progress f3181ac0cf37_f3181ac0cf37_f3181ac0cf37_harbor-portalb2ba0b1ac992 724c576ca3fb "/bin/sh -c 'echo \" …" 12 months ago Exited (2) 12 months ago determined_hermann9f7b7e876bde 724c576ca3fb "/bin/sh -c 'echo \" …" 12 months ago Exited (2) 12 months ago elegant_mirzakhanifa90411e33fe bf20ac214571 "/bin/sh -c 'echo \" …" 12 months ago Exited (2) 12 months ago dazzling_euclidd3517506aac9 bf20ac214571 "/bin/sh -c 'echo \" …" 12 months ago Exited (100) 12 months ago friendly_snyder096be691fbba bf20ac214571 "/bin/sh -c 'echo \" …" 12 months ago Exited (2) 12 months ago zealous_rhodes1e31fc83a48a bf20ac214571 "/bin/sh -c 'echo \" …" 12 months ago Exited (100) 12 months ago epic_chebysheveed93e842291 bf20ac214571 "/bin/sh -c 'echo \" …" 12 months ago Exited (100) 12 months ago dreamy_shannon6c2b6e889664 bf20ac214571 "/bin/sh -c 'echo \" …" 12 months ago Exited (100) 12 months ago
批量删除退出(EXIT)的docker进程:
[root@hdss7-200 harbor]# for i in `docker ps -a|grep -i exit|awk '{print $1}'`;do docker rm -f $i;doneb2ba0b1ac9929f7b7e876bdefa90411e33fed3517506aac9096be691fbba1e31fc83a48aeed93e8422916c2b6e889664e9032d47ed1484430f73fd64
遗留的docker进程删不掉:
[root@hdss7-200 harbor]# docker ps -aCONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES2991dd383a4d goharbor/harbor-portal:v1.8.3 "nginx -g 'daemon of…" 10 minutes ago Up 10 minutes (healthy) 80/tcp harbor-portal39a3a548010c goharbor/harbor-jobservice:v1.8.3 "/harbor/start.sh" 10 minutes ago Up 10 minutes harbor-jobservice8e77c46135f8 goharbor/harbor-core:v1.8.3 "/harbor/start.sh" 10 minutes ago Up 10 minutes (healthy) harbor-corea451e99b8c61 goharbor/harbor-db:v1.8.3 "/entrypoint.sh post…" 10 minutes ago Up 10 minutes (healthy) 5432/tcp harbor-db3e572508f475 goharbor/registry-photon:v2.7.1-patch-2819-v1.8.3 "/entrypoint.sh /etc…" 10 minutes ago Up 10 minutes (healthy) 5000/tcp registry473b7beada5a goharbor/redis-photon:v1.8.3 "docker-entrypoint.s…" 10 minutes ago Up 10 minutes 6379/tcp redis160b88a8c778 goharbor/harbor-registryctl:v1.8.3 "/harbor/start.sh" 10 minutes ago Up 10 minutes (healthy) registryctla5d2fc46e05e goharbor/harbor-log:v1.8.3 "/bin/sh -c /usr/loc…" 10 minutes ago Up 10 minutes (healthy) 127.0.0.1:1514->10514/tcp harbor-logf3181ac0cf37 goharbor/harbor-portal:v1.8.3 "nginx -g 'daemon of…" 2 weeks ago Removal In Progress f3181ac0cf37_f3181ac0cf37_f3181ac0cf37_harbor-portal
docker-compose服务也停不掉:
[root@hdss7-200 harbor]# docker-compose downStopping harbor-portal ... doneStopping harbor-jobservice ... doneStopping harbor-core ... doneStopping registryctl ... doneStopping redis ... doneStopping harbor-db ... doneStopping registry ... doneStopping harbor-log ... doneRemoving harbor-portal ... doneRemoving harbor-jobservice ... doneRemoving harbor-core ... doneRemoving registryctl ... doneRemoving redis ... doneRemoving harbor-db ... doneRemoving registry ... doneRemoving harbor-log ... doneRemoving f3181ac0cf37_f3181ac0cf37_f3181ac0cf37_f3181ac0cf37_f3181ac0cf37_harbor-portal ... errorERROR: for f3181ac0cf37_f3181ac0cf37_f3181ac0cf37_f3181ac0cf37_f3181ac0cf37_harbor-portal container f3181ac0cf37cd29594dd7fa499b06cee71473c4caf63b5282638513a79b081e: driver "overlay2" failed to remove root filesystem: unlinkat /data/docker/overlay2/39f18cbdc9fdb752772c8b4df19b53b5ca27a51969ba8d1c7870f38a02f02bf7/diff/etc/nginx/fastcgi_temp: input/output errorRemoving network harborv183_harbor
停止docker服务也不行:
[root@hdss7-200 ~]# systemctl stop docker.service[root@hdss7-200 ~]# rm -fr /data/docker/overlay2/39f18cbdc9fdb752772c8b4df19b53b5ca27a51969ba8d1c7870f38a02f02bf7rm: cannot remove '/data/docker/overlay2/39f18cbdc9fdb752772c8b4df19b53b5ca27a51969ba8d1c7870f38a02f02bf7/diff/etc/nginx/fastcgi_temp': Input/output error
花式各种删除,都不行:
[root@hdss7-200 harbor]# rm -fr /data/docker/overlay2/39f18cbdc9fdb752772c8b4df19b53b5ca27a51969ba8d1c7870f38a02f02bf7rm: cannot remove '/data/docker/overlay2/39f18cbdc9fdb752772c8b4df19b53b5ca27a51969ba8d1c7870f38a02f02bf7/diff/etc/nginx/fastcgi_temp': Input/output error[root@hdss7-200 harbor]# rm -fr /data/docker/overlay2/39f18cbdc9fdb752772c8b4df19b53b5ca27a51969ba8d1c7870f38a02f02bf7/* rm: cannot remove '/data/docker/overlay2/39f18cbdc9fdb752772c8b4df19b53b5ca27a51969ba8d1c7870f38a02f02bf7/diff/etc/nginx/fastcgi_temp': Input/output error[root@hdss7-200 harbor]# rm -fr /data/docker/overlay2/39f18cbdc9fdb752772c8b4df19b53b5ca27a51969ba8d1c7870f38a02f02bf7/diff/etc/nginx/fastcgi_temprm: cannot remove '/data/docker/overlay2/39f18cbdc9fdb752772c8b4df19b53b5ca27a51969ba8d1c7870f38a02f02bf7/diff/etc/nginx/fastcgi_temp': Input/output error
查看文件属性,也报错:
[root@hdss7-200 harbor]# file /data/docker/overlay2/39f18cbdc9fdb752772c8b4df19b53b5ca27a51969ba8d1c7870f38a02f02bf7/diff/etc/nginx/fastcgi_temp/data/docker/overlay2/39f18cbdc9fdb752772c8b4df19b53b5ca27a51969ba8d1c7870f38a02f02bf7/diff/etc/nginx/fastcgi_temp: cannot open (Input/output error)
挪到临时目录也不行:
[root@hdss7-200 harbor]# mv /data/docker/overlay2/39f18cbdc9fdb752772c8b4df19b53b5ca27a51969ba8d1c7870f38a02f02bf7 /tmpmv: cannot stat '/data/docker/overlay2/39f18cbdc9fdb752772c8b4df19b53b5ca27a51969ba8d1c7870f38a02f02bf7/diff/etc/nginx/fastcgi_temp': Input/output error
重新挂载也不行:
[root@hdss7-200 harbor]# mount sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)devtmpfs on /dev type devtmpfs (rw,nosuid,size=1910324k,nr_inodes=477581,mode=755)securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)tmpfs on /run type tmpfs (rw,nosuid,nodev,mode=755)tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755)cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd)pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime)efivarfs on /sys/firmware/efi/efivars type efivarfs (rw,nosuid,nodev,noexec,relatime)cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids)cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpuacct,cpu)cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_prio,net_cls)cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)configfs on /sys/kernel/config type configfs (rw,relatime)/dev/mapper/centos-root on / type xfs (rw,relatime,attr2,inode64,noquota)systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=27,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=17256)debugfs on /sys/kernel/debug type debugfs (rw,relatime)mqueue on /dev/mqueue type mqueue (rw,relatime)hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime)nfsd on /proc/fs/nfsd type nfsd (rw,relatime)/dev/sda2 on /boot type xfs (rw,relatime,attr2,inode64,noquota)/dev/sda1 on /boot/efi type vfat (rw,relatime,fmask=0077,dmask=0077,codepage=437,iocharset=ascii,shortname=winnt,errors=remount-ro)/dev/mapper/centos-home on /data type xfs (rw,relatime,attr2,inode64,noquota)sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw,relatime)tmpfs on /run/user/0 type tmpfs (rw,nosuid,nodev,relatime,size=384404k,mode=700)binfmt_misc on /proc/sys/fs/binfmt_misc type binfmt_misc (rw,relatime)[root@hdss7-200 harbor]# df -hFilesystem Size Used Avail Use% Mounted on/dev/mapper/centos-root 50G 3.2G 47G 7% /devtmpfs 1.9G 0 1.9G 0% /devtmpfs 1.9G 0 1.9G 0% /dev/shmtmpfs 1.9G 8.4M 1.9G 1% /runtmpfs 1.9G 0 1.9G 0% /sys/fs/cgroup/dev/sda2 1014M 179M 836M 18% /boot/dev/sda1 200M 12M 189M 6% /boot/efi/dev/mapper/centos-home 73G 9.9G 63G 14% /datatmpfs 376M 0 376M 0% /run/user/0[root@hdss7-200 harbor]# mount -o remount,rw /[root@hdss7-200 harbor]# rm -fr /data/docker/overlay2/39f18cbdc9fdb752772c8b4df19b53b5ca27a51969ba8d1c7870f38a02f02bf7/diff/etc/nginx/fastcgi_temp rm: cannot remove '/data/docker/overlay2/39f18cbdc9fdb752772c8b4df19b53b5ca27a51969ba8d1c7870f38a02f02bf7/diff/etc/nginx/fastcgi_temp': Input/output error
和王导QQ联系,没想到他五一没有出去玩。
他最后登录到我的沃佳云服务器,说是重启虚拟机后进入“救援模式”,重新mount,就可以回到正常系统里删除了。
应该是执行如下命令:
mount -o remount, rw /
应该是他在普通环境删除文件: docker进程可以被删除了:
[root@hdss7-200 ~]# docker ps -aCONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMESf3181ac0cf37 goharbor/harbor-portal:v1.8.3 "nginx -g 'daemon of…" 2 weeks ago Dead f3181ac0cf37_f3181ac0cf37_f3181ac0cf37_f3181ac0cf37_f3181ac0cf37_harbor-portal[root@hdss7-200 ~]# docker rm -f f318f318[root@hdss7-200 ~]# docker ps -aCONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
重新部署harbor成功:
[root@hdss7-200 ~]# cd /opt/harbor[root@hdss7-200 harbor]# ./install.sh [Step 0]: checking installation environment ...Note: docker version: 19.03.7Note: docker-compose version: 1.18.0[Step 1]: loading Harbor images ...Loaded image: goharbor/harbor-db:v1.8.3Loaded image: goharbor/redis-photon:v1.8.3Loaded image: goharbor/notary-signer-photon:v0.6.1-v1.8.3Loaded image: goharbor/chartmuseum-photon:v0.9.0-v1.8.3Loaded image: goharbor/harbor-core:v1.8.3Loaded image: goharbor/harbor-log:v1.8.3Loaded image: goharbor/harbor-registryctl:v1.8.3Loaded image: goharbor/notary-server-photon:v0.6.1-v1.8.3Loaded image: goharbor/clair-photon:v2.0.8-v1.8.3Loaded image: goharbor/harbor-migrator:v1.8.3Loaded image: goharbor/prepare:v1.8.3Loaded image: goharbor/harbor-portal:v1.8.3Loaded image: goharbor/nginx-photon:v1.8.3Loaded image: goharbor/harbor-jobservice:v1.8.3Loaded image: goharbor/registry-photon:v2.7.1-patch-2819-v1.8.3[Step 2]: preparing environment ...prepare base dir is set to /opt/harbor-v1.8.3Clearing the configuration file: /config/log/logrotate.confClearing the configuration file: /config/nginx/nginx.confClearing the configuration file: /config/core/envClearing the configuration file: /config/core/app.confClearing the configuration file: /config/registry/config.ymlClearing the configuration file: /config/registry/root.crtClearing the configuration file: /config/registryctl/envClearing the configuration file: /config/registryctl/config.ymlClearing the configuration file: /config/db/envClearing the configuration file: /config/jobservice/envClearing the configuration file: /config/jobservice/config.ymlGenerated configuration file: /config/log/logrotate.confGenerated configuration file: /config/nginx/nginx.confGenerated configuration file: /config/core/envGenerated configuration file: /config/core/app.confGenerated configuration file: /config/registry/config.ymlGenerated configuration file: /config/registryctl/envGenerated configuration file: /config/db/envGenerated configuration file: /config/jobservice/envGenerated configuration file: /config/jobservice/config.ymlCreating harbor-log ... doneGenerated configuration file: /compose_location/docker-compose.ymlClean up the input dirCreating registry ... doneCreating harbor-core ... done[Step 3]: starting Harbor ...Creating harbor-portal ... doneCreating nginx ... doneCreating registry ... Creating harbor-db ... Creating registryctl ... Creating redis ... Creating harbor-core ... Creating harbor-jobservice ... Creating harbor-portal ... Creating nginx ... ✔ ----Harbor has been installed and started successfully.----Now you should be able to visit the admin portal at http://harbor.od.com. For more details, please visit https://github.com/goharbor/harbor .
刷新浏览器页面已恢复正常:
1.问题最终是求助解决的,我还给了他20元的QQ红包。
2.这个问题,一方面是沃佳云真的垃圾到不行,这2年已经异常重启10次以上了。没有那家公司敢用这样的公有云环境。不过,它们的价格倒是真的便宜。而且,多次异常重启逼我遇到过很多故障,反而是帮我提升了排障技术。
3.这次遇到的问题,在生产环境应该反而不容易碰到。毕竟,生产环境这样频繁重启(几乎是每个月沃佳云都会机器重启),公司早就换云平台了。
4.kubernetes,终究还是容器编排。所以,只会k8s,docker玩的不熟,那kubernetes肯定也熟不了。
5.CentOS我当年跟着老男孩只学过单用户修改root密码和修改/etc/fstab文件。这应该也算是linux基础了,过了这么多年,特别是这几年用了阿里云之后,这基础知识更是生疏了。