伍佰目录 短网址
  当前位置:海洋目录网 » 站长资讯 » 站长资讯 » 文章详细 订阅RssFeed

记录一次GreenPlum6 故障以及恢复方法

来源:本站原创 浏览:134次 时间:2021-12-20
Segment检测及故障切换机制
GP Master首先会检测Primary状态,如果Primary不可连通,那么将会检测Mirror状态,Primary/Mirror状态总共有4种:Primary活着,Mirror活着。GP Master探测Primary成功之后直接返回,进行下一个Segment检测;Primary活着,Mirror挂了。GP Master探测Primary成功之后,通过Primary返回的状态得知Mirror挂掉了(Mirror挂掉之后,Primary将会探测到,将自己变成ChangeTracking模式),这时候更新Master元信息,进行下一个Segment检测;Primary挂了,Mirror活着。GP Master探测Primary失败之后探测Mirror,发现Mirror是活着,这时候更新Master上面的元信息,同时使Mirror接管Primary(故障切换),进行下一个Segment检测;Primary挂了,Mirror挂了。GP Master探测Primary失败之后探测Mirror,Mirror也是挂了,直到重试最大值,结束这个Segment的探测,也不更新Master元信息了,进行下一个Segment检测。上面的2-4需要进行gprecoverseg进行segment恢复。对失败的segment节点;启动时会直接跳过,忽略。原文链接:https://blog.csdn.net/kjh2007abc/article/details/85001364参考文档:https://blog.51cto.com/u_13126942/2339755
故障处理

以下内容为作者在启动数据库的时候遇到的故障情况

启动GP集群
gpadmin@mdw ~]$ gpstart -a20210414:22:49:05:005365 gpstart:mdw:gpadmin-[INFO]:-Starting gpstart with args: -a20210414:22:49:05:005365 gpstart:mdw:gpadmin-[INFO]:-Gathering information and validating the environment...20210414:22:49:05:005365 gpstart:mdw:gpadmin-[INFO]:-Greenplum Binary Version: 'postgres (Greenplum Database) 6.14.0 build commit:62d24f4a455276cab4bf2ca4538e96dcf58db8ba Open Source'20210414:22:49:05:005365 gpstart:mdw:gpadmin-[INFO]:-Greenplum Catalog Version: '301908232'20210414:22:49:05:005365 gpstart:mdw:gpadmin-[INFO]:-Starting Master instance in admin mode20210414:22:49:06:005365 gpstart:mdw:gpadmin-[INFO]:-Obtaining Greenplum Master catalog information20210414:22:49:06:005365 gpstart:mdw:gpadmin-[INFO]:-Obtaining Segment details from master...20210414:22:49:06:005365 gpstart:mdw:gpadmin-[INFO]:-Setting new master era20210414:22:49:06:005365 gpstart:mdw:gpadmin-[INFO]:-Master Started...20210414:22:49:06:005365 gpstart:mdw:gpadmin-[INFO]:-Shutting down master20210414:22:49:06:005365 gpstart:mdw:gpadmin-[WARNING]:-Skipping startup of segment marked down in configuration: on sdw3 directory /data01/gpdata/gpdatap1/gpseg4 <<<<<20210414:22:49:06:005365 gpstart:mdw:gpadmin-[WARNING]:-Skipping startup of segment marked down in configuration: on sdw3 directory /data01/gpdata/gpdatap2/gpseg5 <<<<<20210414:22:49:06:005365 gpstart:mdw:gpadmin-[INFO]:-Commencing parallel primary and mirror segment instance startup, please wait.....20210414:22:49:09:005365 gpstart:mdw:gpadmin-[INFO]:-Process results...20210414:22:49:09:005365 gpstart:mdw:gpadmin-[INFO]:-----------------------------------------------------20210414:22:49:09:005365 gpstart:mdw:gpadmin-[INFO]:-   Successful segment starts                                            = 1020210414:22:49:09:005365 gpstart:mdw:gpadmin-[INFO]:-   Failed segment starts                                                = 020210414:22:49:09:005365 gpstart:mdw:gpadmin-[WARNING]:-Skipped segment starts (segments are marked down in configuration)   = 2    <<<<<<<<20210414:22:49:09:005365 gpstart:mdw:gpadmin-[INFO]:-----------------------------------------------------20210414:22:49:09:005365 gpstart:mdw:gpadmin-[INFO]:-Successfully started 10 of 10 segment instances, skipped 2 other segments 20210414:22:49:09:005365 gpstart:mdw:gpadmin-[INFO]:-----------------------------------------------------20210414:22:49:09:005365 gpstart:mdw:gpadmin-[WARNING]:-****************************************************************************20210414:22:49:09:005365 gpstart:mdw:gpadmin-[WARNING]:-There are 2 segment(s) marked down in the database20210414:22:49:09:005365 gpstart:mdw:gpadmin-[WARNING]:-To recover from this current state, review usage of the gprecoverseg20210414:22:49:09:005365 gpstart:mdw:gpadmin-[WARNING]:-management utility which will recover failed segment instance databases.20210414:22:49:09:005365 gpstart:mdw:gpadmin-[WARNING]:-****************************************************************************20210414:22:49:09:005365 gpstart:mdw:gpadmin-[INFO]:-Starting Master instance mdw directory /data01/gpdata/gpmaster/gpseg-1 20210414:22:49:09:005365 gpstart:mdw:gpadmin-[INFO]:-Command pg_ctl reports Master mdw instance active20210414:22:49:09:005365 gpstart:mdw:gpadmin-[INFO]:-Connecting to dbname='template1' connect_timeout=1520210414:22:49:10:005365 gpstart:mdw:gpadmin-[INFO]:-Starting standby master20210414:22:49:10:005365 gpstart:mdw:gpadmin-[INFO]:-Checking if standby master is running on host: smdw  in directory: /data01/gpdata/gpmaster/gpseg-120210414:22:49:11:005365 gpstart:mdw:gpadmin-[WARNING]:-Number of segments not attempted to start: 220210414:22:49:11:005365 gpstart:mdw:gpadmin-[INFO]:-Check status of database with gpstate utility
查看启动状态

查看数据库的mirror的节点启动状态

[gpadmin@mdw ~]$ gpstate -m20210414:22:49:58:006300 gpstate:mdw:gpadmin-[INFO]:-Starting gpstate with args: -m20210414:22:49:58:006300 gpstate:mdw:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 6.14.0 build commit:62d24f4a455276cab4bf2ca4538e96dcf58db8ba Open Source'20210414:22:49:58:006300 gpstate:mdw:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 9.4.24 (Greenplum Database 6.14.0 build commit:62d24f4a455276cab4bf2ca4538e96dcf58db8ba Open Source) on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 6.4.0, 64-bit compiled on Feb  5 2021 18:58:52'20210414:22:49:58:006300 gpstate:mdw:gpadmin-[INFO]:-Obtaining Segment details from master...20210414:22:49:58:006300 gpstate:mdw:gpadmin-[INFO]:--------------------------------------------------------------20210414:22:49:58:006300 gpstate:mdw:gpadmin-[INFO]:--Current GPDB mirror list and status20210414:22:49:58:006300 gpstate:mdw:gpadmin-[INFO]:--Type = Group20210414:22:49:58:006300 gpstate:mdw:gpadmin-[INFO]:--------------------------------------------------------------20210414:22:49:58:006300 gpstate:mdw:gpadmin-[INFO]:-   Mirror   Datadir                          Port    Status              Data Status    20210414:22:49:58:006300 gpstate:mdw:gpadmin-[INFO]:-   sdw2     /data01/gpdata/gpdatam1/gpseg0   50000   Passive             Not In Sync20210414:22:49:58:006300 gpstate:mdw:gpadmin-[INFO]:-   sdw2     /data01/gpdata/gpdatam2/gpseg1   50001   Passive             Synchronized20210414:22:49:58:006300 gpstate:mdw:gpadmin-[INFO]:-   sdw3     /data01/gpdata/gpdatam1/gpseg2   50000   Passive             Not In Sync20210414:22:49:58:006300 gpstate:mdw:gpadmin-[INFO]:-   sdw3     /data01/gpdata/gpdatam2/gpseg3   50001   Passive             Not In Sync20210414:22:49:58:006300 gpstate:mdw:gpadmin-[INFO]:-   sdw1     /data01/gpdata/gpdatam1/gpseg4   50000   Acting as Primary   Not In Sync20210414:22:49:58:006300 gpstate:mdw:gpadmin-[INFO]:-   sdw1     /data01/gpdata/gpdatam2/gpseg5   50001   Acting as Primary   Not In Sync20210414:22:49:58:006300 gpstate:mdw:gpadmin-[INFO]:--------------------------------------------------------------20210414:22:49:58:006300 gpstate:mdw:gpadmin-[WARNING]:-2 segment(s) configured as mirror(s) are acting as primaries20210414:22:49:58:006300 gpstate:mdw:gpadmin-[WARNING]:-2 mirror segment(s) acting as primaries are not synchronized
如何恢复出故障的segment状态
如何恢复这个primary /mirror segment呢?首先产生一个恢复的配置文件 : gprecoverseg -o ./recov[gpadmin@mdw ~]$ gprecoverseg -o ./recov20210414:22:51:18:007797 gprecoverseg:mdw:gpadmin-[INFO]:-Starting gprecoverseg with args: -o ./recov20210414:22:51:19:007797 gprecoverseg:mdw:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 6.14.0 build commit:62d24f4a455276cab4bf2ca4538e96dcf58db8ba Open Source'20210414:22:51:19:007797 gprecoverseg:mdw:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 9.4.24 (Greenplum Database 6.14.0 build commit:62d24f4a455276cab4bf2ca4538e96dcf58db8ba Open Source) on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 6.4.0, 64-bit compiled on Feb  5 2021 18:58:52'20210414:22:51:19:007797 gprecoverseg:mdw:gpadmin-[INFO]:-Obtaining Segment details from master...20210414:22:51:20:007797 gprecoverseg:mdw:gpadmin-[INFO]:-Configuration file output to ./recov successfully.
查看恢复的配置文件
可以知道哪些segment需要恢复[gpadmin@mdw ~]$ cat recovsdw3|40000|/data01/gpdata/gpdatap1/gpseg4sdw3|40001|/data01/gpdata/gpdatap2/gpseg5
使用配置文件进行恢复
[gpadmin@mdw ~]$ [gpadmin@mdw ~]$ gprecoverseg -i ./recov20210414:22:52:17:008924 gprecoverseg:mdw:gpadmin-[INFO]:-Starting gprecoverseg with args: -i ./recov20210414:22:52:18:008924 gprecoverseg:mdw:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 6.14.0 build commit:62d24f4a455276cab4bf2ca4538e96dcf58db8ba Open Source'20210414:22:52:18:008924 gprecoverseg:mdw:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 9.4.24 (Greenplum Database 6.14.0 build commit:62d24f4a455276cab4bf2ca4538e96dcf58db8ba Open Source) on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 6.4.0, 64-bit compiled on Feb  5 2021 18:58:52'20210414:22:52:18:008924 gprecoverseg:mdw:gpadmin-[INFO]:-Obtaining Segment details from master...20210414:22:52:19:008924 gprecoverseg:mdw:gpadmin-[INFO]:-Heap checksum setting is consistent between master and the segments that are candidates for recoverseg20210414:22:52:19:008924 gprecoverseg:mdw:gpadmin-[INFO]:-Greenplum instance recovery parameters20210414:22:52:19:008924 gprecoverseg:mdw:gpadmin-[INFO]:----------------------------------------------------------20210414:22:52:19:008924 gprecoverseg:mdw:gpadmin-[INFO]:-Recovery from configuration -i option supplied20210414:22:52:19:008924 gprecoverseg:mdw:gpadmin-[INFO]:----------------------------------------------------------20210414:22:52:19:008924 gprecoverseg:mdw:gpadmin-[INFO]:-Recovery 1 of 220210414:22:52:19:008924 gprecoverseg:mdw:gpadmin-[INFO]:----------------------------------------------------------20210414:22:52:19:008924 gprecoverseg:mdw:gpadmin-[INFO]:-   Synchronization mode                 = Incremental20210414:22:52:19:008924 gprecoverseg:mdw:gpadmin-[INFO]:-   Failed instance host                 = sdw320210414:22:52:19:008924 gprecoverseg:mdw:gpadmin-[INFO]:-   Failed instance address              = sdw320210414:22:52:19:008924 gprecoverseg:mdw:gpadmin-[INFO]:-   Failed instance directory            = /data01/gpdata/gpdatap1/gpseg420210414:22:52:19:008924 gprecoverseg:mdw:gpadmin-[INFO]:-   Failed instance port                 = 4000020210414:22:52:19:008924 gprecoverseg:mdw:gpadmin-[INFO]:-   Recovery Source instance host        = sdw120210414:22:52:19:008924 gprecoverseg:mdw:gpadmin-[INFO]:-   Recovery Source instance address     = sdw120210414:22:52:19:008924 gprecoverseg:mdw:gpadmin-[INFO]:-   Recovery Source instance directory   = /data01/gpdata/gpdatam1/gpseg420210414:22:52:19:008924 gprecoverseg:mdw:gpadmin-[INFO]:-   Recovery Source instance port        = 5000020210414:22:52:19:008924 gprecoverseg:mdw:gpadmin-[INFO]:-   Recovery Target                      = in-place20210414:22:52:19:008924 gprecoverseg:mdw:gpadmin-[INFO]:----------------------------------------------------------20210414:22:52:19:008924 gprecoverseg:mdw:gpadmin-[INFO]:-Recovery 2 of 220210414:22:52:19:008924 gprecoverseg:mdw:gpadmin-[INFO]:----------------------------------------------------------20210414:22:52:19:008924 gprecoverseg:mdw:gpadmin-[INFO]:-   Synchronization mode                 = Incremental20210414:22:52:19:008924 gprecoverseg:mdw:gpadmin-[INFO]:-   Failed instance host                 = sdw320210414:22:52:19:008924 gprecoverseg:mdw:gpadmin-[INFO]:-   Failed instance address              = sdw320210414:22:52:19:008924 gprecoverseg:mdw:gpadmin-[INFO]:-   Failed instance directory            = /data01/gpdata/gpdatap2/gpseg520210414:22:52:19:008924 gprecoverseg:mdw:gpadmin-[INFO]:-   Failed instance port                 = 4000120210414:22:52:19:008924 gprecoverseg:mdw:gpadmin-[INFO]:-   Recovery Source instance host        = sdw120210414:22:52:19:008924 gprecoverseg:mdw:gpadmin-[INFO]:-   Recovery Source instance address     = sdw120210414:22:52:19:008924 gprecoverseg:mdw:gpadmin-[INFO]:-   Recovery Source instance directory   = /data01/gpdata/gpdatam2/gpseg520210414:22:52:19:008924 gprecoverseg:mdw:gpadmin-[INFO]:-   Recovery Source instance port        = 5000120210414:22:52:19:008924 gprecoverseg:mdw:gpadmin-[INFO]:-   Recovery Target                      = in-place20210414:22:52:19:008924 gprecoverseg:mdw:gpadmin-[INFO]:----------------------------------------------------------Continue with segment recovery procedure Yy|Nn (default=N):> Y20210414:22:52:24:008924 gprecoverseg:mdw:gpadmin-[INFO]:-Starting to modify pg_hba.conf on primary segments to allow replication connections20210414:22:52:30:008924 gprecoverseg:mdw:gpadmin-[INFO]:-Successfully modified pg_hba.conf on primary segments to allow replication connections20210414:22:52:30:008924 gprecoverseg:mdw:gpadmin-[INFO]:-2 segment(s) to recover20210414:22:52:30:008924 gprecoverseg:mdw:gpadmin-[INFO]:-Ensuring 2 failed segment(s) are stopped20210414:22:52:31:008924 gprecoverseg:mdw:gpadmin-[INFO]:-Ensuring that shared memory is cleaned up for stopped segments20210414:22:52:31:008924 gprecoverseg:mdw:gpadmin-[INFO]:-Updating configuration with new mirrors20210414:22:52:31:008924 gprecoverseg:mdw:gpadmin-[INFO]:-Updating mirrors20210414:22:52:31:008924 gprecoverseg:mdw:gpadmin-[INFO]:-Running pg_rewind on required mirrorssdw3 (dbid 6): Done!sdw3 (dbid 7): Done!20210414:22:52:49:008924 gprecoverseg:mdw:gpadmin-[INFO]:-Starting mirrors20210414:22:52:49:008924 gprecoverseg:mdw:gpadmin-[INFO]:-era is 8d2c5a3f6db9998a_21041422490620210414:22:52:49:008924 gprecoverseg:mdw:����,����gpadmin-[INFO]:-Commencing parallel segment instance startup, please wait....20210414:22:52:51:008924 gprecoverseg:mdw:gpadmin-[INFO]:-Process results...20210414:22:52:52:008924 gprecoverseg:mdw:gpadmin-[INFO]:-Triggering FTS probe20210414:22:52:52:008924 gprecoverseg:mdw:gpadmin-[INFO]:-******************************************************************20210414:22:52:52:008924 gprecoverseg:mdw:gpadmin-[INFO]:-Updating segments for streaming is completed.20210414:22:52:52:008924 gprecoverseg:mdw:gpadmin-[INFO]:-For segments updated successfully, streaming will continue in the background.20210414:22:52:52:008924 gprecoverseg:mdw:gpadmin-[INFO]:-Use  gpstate -s  to check the streaming progress.20210414:22:52:52:008924 gprecoverseg:mdw:gpadmin-[INFO]:-******************************************************************
查看恢复状态
[gpadmin@mdw ~]$ gpstate -m20210414:22:53:00:009531 gpstate:mdw:gpadmin-[INFO]:-Starting gpstate with args: -m20210414:22:53:01:009531 gpstate:mdw:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 6.14.0 build commit:62d24f4a455276cab4bf2ca4538e96dcf58db8ba Open Source'20210414:22:53:01:009531 gpstate:mdw:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 9.4.24 (Greenplum Database 6.14.0 build commit:62d24f4a455276cab4bf2ca4538e96dcf58db8ba Open Source) on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 6.4.0, 64-bit compiled on Feb  5 2021 18:58:52'20210414:22:53:01:009531 gpstate:mdw:gpadmin-[INFO]:-Obtaining Segment details from master...20210414:22:53:01:009531 gpstate:mdw:gpadmin-[INFO]:--------------------------------------------------------------20210414:22:53:01:009531 gpstate:mdw:gpadmin-[INFO]:--Current GPDB mirror list and status20210414:22:53:01:009531 gpstate:mdw:gpadmin-[INFO]:--Type = Group20210414:22:53:01:009531 gpstate:mdw:gpadmin-[INFO]:--------------------------------------------------------------20210414:22:53:01:009531 gpstate:mdw:gpadmin-[INFO]:-   Mirror   Datadir                          Port    Status              Data Status    20210414:22:53:01:009531 gpstate:mdw:gpadmin-[INFO]:-   sdw2     /data01/gpdata/gpdatam1/gpseg0   50000   Passive             Synchronized20210414:22:53:01:009531 gpstate:mdw:gpadmin-[INFO]:-   sdw2     /data01/gpdata/gpdatam2/gpseg1   50001   Passive             Synchronized20210414:22:53:01:009531 gpstate:mdw:gpadmin-[INFO]:-   sdw3     /data01/gpdata/gpdatam1/gpseg2   50000   Passive             Synchronized20210414:22:53:01:009531 gpstate:mdw:gpadmin-[INFO]:-   sdw3     /data01/gpdata/gpdatam2/gpseg3   50001   Passive             Synchronized20210414:22:53:01:009531 gpstate:mdw:gpadmin-[INFO]:-   sdw1     /data01/gpdata/gpdatam1/gpseg4   50000   Acting as Primary   Synchronized20210414:22:53:01:009531 gpstate:mdw:gpadmin-[INFO]:-   sdw1     /data01/gpdata/gpdatam2/gpseg5   50001   Acting as Primary   Synchronized20210414:22:53:01:009531 gpstate:mdw:gpadmin-[INFO]:--------------------------------------------------------------20210414:22:53:01:009531 gpstate:mdw:gpadmin-[WARNING]:-2 segment(s) configured as mirror(s) are acting as primaries
gprecoverseg 使用方法
用于修复Segment的是gprecoverseg。使用方式比较简单,几个主要参数如下:-i :主要参数,用于指定一个配置文件,该配置文件描述了需要修复的Segment和修复后的目的位置。 -F :可选项,指定后,gprecoverseg会将”-i”中指定的或标记”d”的实例删除,并从活着的Mirror复制一个完整一份到目标位置。 -r :当FTS发现有Primary宕机并进行主备切换,在gprecoverseg修复后,担当Primary的Mirror角色并不会立即切换回来,就会导致部分主机上活跃的Segment过多从而引起性能瓶颈。因此需要恢复Segment原先的角色,称为re-balance.

  推荐站点

  • At-lib分类目录At-lib分类目录

    At-lib网站分类目录汇集全国所有高质量网站,是中国权威的中文网站分类目录,给站长提供免费网址目录提交收录和推荐最新最全的优秀网站大全是名站导航之家

    www.at-lib.cn
  • 中国链接目录中国链接目录

    中国链接目录简称链接目录,是收录优秀网站和淘宝网店的网站分类目录,为您提供优质的网址导航服务,也是网店进行收录推广,站长免费推广网站、加快百度收录、增加友情链接和网站外链的平台。

    www.cnlink.org
  • 35目录网35目录网

    35目录免费收录各类优秀网站,全力打造互动式网站目录,提供网站分类目录检索,关键字搜索功能。欢迎您向35目录推荐、提交优秀网站。

    www.35mulu.com
  • 就要爱网站目录就要爱网站目录

    就要爱网站目录,按主题和类别列出网站。所有提交的网站都经过人工审查,确保质量和无垃圾邮件的结果。

    www.912219.com
  • 伍佰目录伍佰目录

    伍佰网站目录免费收录各类优秀网站,全力打造互动式网站目录,提供网站分类目录检索,关键字搜索功能。欢迎您向伍佰目录推荐、提交优秀网站。

    www.wbwb.net