前言
MHA简介: MHA(Master HA)是一款开源的 MySQL 的高可用程序,它为 MySQL主从复制架构提供了 automating master failover (自动化主故障转移)功能。MHA 在监控到 master 节点故障时,会提升其中拥有最新数据的 slave 节点成为新的master 节点,在此期间,MHA 会通过于其它从节点获取额外信息来避免一致性方面的问题。MHA 还提供了 master 节点的在线切换功能,即按需切换 master/slave 节点。 MHA 是由日本人 yoshinorim(原就职于DeNA现就职于FaceBook)开发的比较成熟的 MySQL 高可用方案。MHA 能够在30秒内实现故障切换,并能在故障切换中,最大可能的保证数据一致性。目前淘宝也正在开发相似产品 TMHA, 目前已支持一主一从。
MHA工作原理和架构
MHA集群架构 {data-type=drawio data-xml=%3CmxGraphModel%20dx=%22747%22%20dy=%22394%22%20grid=%221%22%20gridSize=%2210%22%20guides=%221%22%20tooltips=%221%22%20connect=%221%22%20arrows=%221%22%20fold=%221%22%20page=%221%22%20pageScale=%221%22%20pageWidth=%22827%22%20pageHeight=%221169%22%3E%3Croot%3E%3CmxCell%20id=%220%22/%3E%3CmxCell%20id=%221%22%20parent=%220%22/%3E%3CmxCell%20id=%226%22%20style=%22edgeStyle=orthogonalEdgeStyle;rounded=0;html=1;exitX=0;exitY=0;entryX=0.5;entryY=1;jettySize=auto;orthogonalLoop=1;%22%20edge=%221%22%20parent=%221%22%20source=%222%22%20target=%223%22%3E%3CmxGeometry%20relative=%221%22%20as=%22geometry%22/%3E%3C/mxCell%3E%3CmxCell%20id=%227%22%20style=%22edgeStyle=orthogonalEdgeStyle;rounded=0;html=1;exitX=0.5;exitY=0;entryX=0.5;entryY=1;jettySize=auto;orthogonalLoop=1;%22%20edge=%221%22%20parent=%221%22%20source=%222%22%20target=%224%22%3E%3CmxGeometry%20relative=%221%22%20as=%22geometry%22/%3E%3C/mxCell%3E%3CmxCell%20id=%228%22%20style=%22edgeStyle=orthogonalEdgeStyle;rounded=0;html=1;exitX=1;exitY=0;entryX=0.5;entryY=1;jettySize=auto;orthogonalLoop=1;%22%20edge=%221%22%20parent=%221%22%20source=%222%22%20target=%225%22%3E%3CmxGeometry%20relative=%221%22%20as=%22geometry%22/%3E%3C/mxCell%3E%3CmxCell%20id=%222%22%20value=%22manager%22%20style=%22ellipse;whiteSpace=wrap;html=1;fillColor=#00FF00;%22%20vertex=%221%22%20parent=%221%22%3E%3CmxGeometry%20x=%22310%22%20y=%22300%22%20width=%22120%22%20height=%2280%22%20as=%22geometry%22/%3E%3C/mxCell%3E%3CmxCell%20id=%2211%22%20style=%22edgeStyle=orthogonalEdgeStyle;rounded=0;html=1;exitX=0.85;exitY=0.05;exitPerimeter=0;entryX=0.5;entryY=0;jettySize=auto;orthogonalLoop=1;%22%20edge=%221%22%20parent=%221%22%20source=%223%22%20target=%224%22%3E%3CmxGeometry%20relative=%221%22%20as=%22geometry%22/%3E%3C/mxCell%3E%3CmxCell%20id=%223%22%20value=%22master%22%20style=%22shape=cylinder;whiteSpace=wrap;html=1;boundedLbl=1;backgroundOutline=1;fillColor=#97D077;%22%20vertex=%221%22%20parent=%221%22%3E%3CmxGeometry%20x=%22100%22%20y=%2280%22%20width=%2260%22%20height=%2280%22%20as=%22geometry%22/%3E%3C/mxCell%3E%3CmxCell%20id=%224%22%20value=%22%22%20style=%22shape=cylinder;whiteSpace=wrap;html=1;boundedLbl=1;backgroundOutline=1;fillColor=#A9C4EB;%22%20vertex=%221%22%20parent=%221%22%3E%3CmxGeometry%20x=%22340%22%20y=%2270%22%20width=%2260%22%20height=%2280%22%20as=%22geometry%22/%3E%3C/mxCell%3E%3CmxCell%20id=%225%22%20value=%22%22%20style=%22shape=cylinder;whiteSpace=wrap;html=1;boundedLbl=1;backgroundOutline=1;fillColor=#7EA6E0;%22%20vertex=%221%22%20parent=%221%22%3E%3CmxGeometry%20x=%22590%22%20y=%2270%22%20width=%2260%22%20height=%2280%22%20as=%22geometry%22/%3E%3C/mxCell%3E%3CmxCell%20id=%229%22%20value=%22slave%22%20style=%22text;html=1;resizable=0;points=%5B%5D;autosize=1;align=left;verticalAlign=top;spacingTop=-4;fillColor=#FFE599;%22%20vertex=%221%22%20parent=%221%22%3E%3CmxGeometry%20x=%22400%22%20y=%22130%22%20width=%2240%22%20height=%2220%22%20as=%22geometry%22/%3E%3C/mxCell%3E%3CmxCell%20id=%2212%22%20value=%22%E6%95%85%E9%9A%9C%E8%BD%AC%E7%A7%BB%22%20style=%22text;html=1;resizable=0;points=%5B%5D;autosize=1;align=left;verticalAlign=top;spacingTop=-4;fillColor=#FF3333;%22%20vertex=%221%22%20parent=%221%22%3E%3CmxGeometry%20x=%22185%22%20y=%2260%22%20width=%2260%22%20height=%2220%22%20as=%22geometry%22/%3E%3C/mxCell%3E%3CmxCell%20id=%2213%22%20value=%22master%22%20style=%22text;html=1;resizable=0;points=%5B%5D;autosize=1;align=left;verticalAlign=top;spacingTop=-4;fillColor=#FFF4C3;%22%20vertex=%221%22%20parent=%221%22%3E%3CmxGeometry%20x=%22375%22%20y=%2244%22%20width=%2250%22%20height=%2220%22%20as=%22geometry%22/%3E%3C/mxCell%3E%3CmxCell%20id=%2214%22%20value=%22slave%22%20style=%22text;html=1;resizable=0;points=%5B%5D;autosize=1;align=left;verticalAlign=top;spacingTop=-4;fillColor=#FFE599;%22%20vertex=%221%22%20parent=%221%22%3E%3CmxGeometry%20x=%22650%22%20y=%22130%22%20width=%2240%22%20height=%2220%22%20as=%22geometry%22/%3E%3C/mxCell%3E%3C/root%3E%3C/mxGraphModel%3E}
MHA工作原理总结: (1) 获取从宕机崩溃的 master 保存二进制日志事件(binlog events); (2) 识别含有最新更新的 slave ; (3) 将差异的中继日志(relay log)应用到其他 slave ; (4) 将 master 保存的二进制日志事件(binlog events)应用到要提升为master节点的slave; (5) 将这 slave 只读模式解除并提升为新 master ,重新部署主从关系;
MHA软件组成
Manager工具包和Node工具包
Manager工具包
masterha_check_ssh 检查MHA的SSH配置状况
masterha_check_repl 检查MySQL复制状况
masterha_manger 启动MHA
masterha_check_status 检测当前MHA运行状态
masterha_master_monitor 检测master是否宕机
masterha_master_switch 故障转移(自动或手动)
masterha_conf_host 添加或删除配置的server信息
masterha_stop --conf=app1.cnf 停止MHA
masterha_secondary_check 两个或多个网络线路检查MySQL主服务器的可用
Node工具包:这些工具通常由MHA Manager的脚本触发,无需人为操作)主要包括以下几个工具:
save_binary_logs #保存和复制master的二进制日志
apply_diff_relay_logs #识别差异的中继日志事件并将其差异的事件应用于其他的slave
filter_mysqlbinlog #去除不必要的ROLLBACK事件(MHA已不再使用此工具)
purge_relay_logs #清除中继日志(不会阻塞SQL线程)
自定义拓展
`secondary_check_script`:通过多条网络路由检测master的可用性; `master_ip_failover_script`:更新application使用的masterip; `report_script`:发送报告;
`init_conf_load_script`:加载初始配置参数; `master_ip_online_change_script`;更新master节点ip地址
配置文件
global配置,为各application提供默认配置,默认文件路径 /etc/masterha_default.cnf
application配置:为每个主从复制集群
实验配置
| 名称 | IP | 角色 | 备注 | | ------ | ------ | ------ | ------ | | MHA manager | 64.15 | manage控制器 | centos7 监控管理 | | master | 64.16 | 数据库主 | 开启bin-log | | slave1 | 64.18 | 数据库从1 | 开启bin-log | | slave2 | 64.200 | 数据库从2| 开启bin-log Ubuntu22.04|
!!#ff0000 注:实验各软件版本需要保持一致 避免发生没有必要的问题。!!
实验统一前准备
#关闭防火墙
systemctl disable --now firewalld
setenforce 0
ufw disable #ubantu关闭防火墙
#安装epel源
yum install epel-release.noarch -y
#如果源异常 可以替换为阿里源或者清华源等 提前备份 建议
wget -O /etc/yum.repos.d/CenOS-Base.repo https://mirrors.aliyun.com/repo/Centos-7.repo
#上传node包 到除manager服务器
yum install (package name) -y
#注 ubantu 安装epel源之后 可以使用yum安装 常规情况下使用apt
#如果安装不上epel源 可以改一下变量 后重新获取安装软件 用apt重新安装
bash -c 'echo "deb http://www.percona.com/downloads/percona-release/ubuntu $(lsb_release -sc) main" > /etc/apt/sources.list.d/percona.list'
sudo apt-get update
apt-get install mha4mysql-node
#安装rsync
apt-get install rsync #ubantu安装
yum install rsync #centos安装
#做免密登录
rsync -a .ssh 192.168.64.16/root/
rsync -a .ssh 192.168.64.18/root/
rsync -a .ssh 192.168.64.200/root/
注意:ubantu默认是不允许 使用root账户登录ssh 需要修改配置文件 /etc/ssh/sshd_config
PermitRootLogin yes
usepam no
所有从服务器上MySQL需要新建用户 为了主从切换的时候 有用户执行操作。
create user test@'192.168.64.%' identified by "Admin@123";
grant replication slave on *.* to test@'192.168.64.%' identified by 'Admin@123';
MHA manager 服务器操作
主服务器 需要安装manager和node 两个工具包
#上传安装包 到对应安装包
mha4mysql-node-0.58-0.el7.centos.noarch.rpm
mha4mysql-manager-0.58-0.el7.centos.noarch.rpm
[root@xyy1 opt]# yum -y install mha4mysql-*.rpm
mkdir /etc/mastermha
vim /etc/mastermha/app1.cnf
[server default]
user=mhauser
password=Admin@123
manager_workdir=/data/mastermha/app1/
manager_log=/data/mastermha/app1/manager.log
remote_workdir=/data/mastermha/app1/
ssh_user=root
repl_user=test
repl_password=Admin@123
ping_interval=1
master_ip_failover_script=/usr/local/bin/master_ip_failover
check_repl_delay=0
master_binlog_dir=/data/mysql/
[server1]
hostname=192.168.64.16
candidate_master=1
[server2]
hostname=192.168.64.18
candidate_master=1
[server3]
hostname=192.168.64.200
-----------------------------------------------------------
# mha用于访问数据库的账户和密码
user=mhauser
password=Admin@123
# 指定mha的工作目录
manager_workdir=/data/mastermha/app1/
# mha日志文件的存放路径
manager_log=/data/mastermha/app1/manager.log
# 指定mha在远程节点上的工作目录
remote_workdir=/data/mastermha/app1/
# 可以使用ssh登录的用户
ssh_user=root
# 用于主从复制的MySQL用户和密码
repl_user=test
repl_password=Admin@123
# 指定间隔多少秒检测一次
ping_interval=1
# 指定master节点存放binlog日志文件的目录 可选项
master_binlog_dir=/var/lib/mysql
# 指定一个脚本,该脚本实现了在主从切换之后,将虚拟IP漂移到新的Master上
master_ip_failover_script=/usr/bin/master_ip_failover
# 指定用于二次检查节点状态的脚本
secondary_check_script=/usr/bin/masterha_secondary_check -s
# 配置集群中的节点信息
[server1]
hostname=192.168.64.16
# 指定该节点可以参与Master选举
candidate_master=1
[server2]
hostname=192.168.64.18
candidate_master=1
[server3]
hostname=192.168.64.200
# 指定该节点不能参与Master选举
no_master=1
#切换脚本
[root@xyy1 /]# vim master_ip_failover
#!/usr/bin/env perl
use strict;
use warnings FATAL => 'all';
use Getopt::Long;
my (
$command, $ssh_user, $orig_master_host, $orig_master_ip,
$orig_master_port, $new_master_host, $new_master_ip, $new_master_port
);
my $vip = '192.168.64.99/24';
my $gateway = '192.168.64.2';
my $interface = 'ens33';
my $key = "1";
my $ssh_start_vip = "/sbin/ifconfig $interface:$key $vip;/sbin/arping -I $interface -c 3 -s $vip $gateway >/dev/null 2>&1";
my $ssh_stop_vip = "/sbin/ifconfig $interface:$key down";
GetOptions(
'command=s' => \$command,
'ssh_user=s' => \$ssh_user,
'orig_master_host=s' => \$orig_master_host,
'orig_master_ip=s' => \$orig_master_ip,
'orig_master_port=i' => \$orig_master_port,
'new_master_host=s' => \$new_master_host,
'new_master_ip=s' => \$new_master_ip,
'new_master_port=i' => \$new_master_port,
);
exit &main();
sub main {
print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";
if ( $command eq "stop" || $command eq "stopssh" ) {
# $orig_master_host, $orig_master_ip, $orig_master_port are passed.
# If you manage master ip address at global catalog database,
# invalidate orig_master_ip here.
my $exit_code = 1;
eval {
print "Disabling the VIP on old master: $orig_master_host \n";
&stop_vip();
$exit_code = 0;
};
if ($@) {
warn "Got Error: $@\n";
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "start" ) {
# all arguments are passed.
# If you manage master ip address at global catalog database,
# activate new_master_ip here.
# You can also grant write access (create user, set read_only=0, etc) here.
my $exit_code = 10;
eval {
print "Enabling the VIP - $vip on the new master - $new_master_host \n";
&start_vip();
$exit_code = 0;
};
if ($@) {
warn $@;
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "status" ) {
print "Checking the Status of the script.. OK \n";
`ssh $ssh_user\@$orig_master_host \" $ssh_start_vip \"`;
exit 0;
}
else {
&usage();
exit 1;
}
}
# A simple system call that enable the VIP on the new master
sub start_vip() {
`ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
# A simple system call that disable the VIP on the old_master
sub stop_vip() {
`ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}
sub usage {
print
"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}
[root@xyy1 /]# cp master_ip_failover /usr/local/bin/
[root@xyy1 /]# chmod +x /usr/local/bin/master_ip_failover
配置主服务器 64.16
#修改主配置文件
[root@xyy1 ~]# vim /etc/my.cnf
[mysqld]
server_id=100
log-bin=/data/mysql/mysql-bin
skip_name_resolve=1
general_log
----------------------------------------------------------
server_id=100 #复制集群中的各节点的id均必须唯一
log-bin=/data/mysql/mysql-bin #开启二进制日志
skip_name_resolve=1 # 关闭名称解析
general_log #记录所有操作
relay-log = relay-log #开启中继日志
[root@xyy1 ~]# mysql -uroot -p123123
mysql> show master status;
+------------------+----------+--------------+------------------+-------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+----------+--------------+------------------+-------------------+
| mysql-bin.000001 | 436 | | | |
+------------------+----------+--------------+------------------+-------------------+
1 row in set (0.01 sec)
mysql> grant replication slave on *.* to test@'192.168.64.%' identified by 'Admin@123';
#建立复制用户
Query OK, 0 rows affected, 1 warning (0.00 sec)
mysql> grant all on *.* to mhauser@'192.168.64.%' identified by 'Admin@123';
#建立 mha管理账户
ifconfig ens33:1 192.168.64.99/24
#设置虚拟地址
ifconfig
ens33:1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.64.99 netmask 255.255.255.0 broadcast 192.168.64.255
ether 00:0c:29:6c:b6:1a txqueuelen 1000 (Ethernet)
配置从服务器1 64.18
vim /etc/my.cf
server_id=101
log-bin=/data/mysql/mysql-bin
read_only
relay_log_purge=0
skip_name_resolve=1
general_log
------------------------------------------------------
#read_only 开启只读
#relay_log_purge=0 开启中继日志
mkdir /data/mysql/ -p
chown mysql.mysql /data/ -R
systemctl restart mysqld
[root@xyy2 opt]# mysql -uroot -p123123
CHANGE MASTER TO
MASTER_HOST='192.168.64.16',
MASTER_USER='test',
MASTER_PASSWORD='Admin@123',
MASTER_PORT=3306,
MASTER_LOG_FILE='mysql-bin.000001',
MASTER_LOG_POS=436;
mysql> start slave;
mysql> show slave status\G;
配置从服务器2 64.200
apt安装的mysql数据库 配置文件位置
vim /etc/mysql/mysql.conf.d/mysqld.cnf
server_id=200
#唯一id
log-bin=/var/log/mysql/mysql-bin.log
#二进制日志文件位置 默认位置 建议不要自定义
read_only=1
#开启只读
relay-log=mysql-relay-bin
#中继日志位置
skip_name_resolve=1
general_log
systemctl restart mysqld
#登录mysql
CHANGE MASTER TO
-> MASTER_HOST='192.168.64.16',
-> MASTER_USER='test',
-> MASTER_PASSWORD='Admin@123',
-> MASTER_PORT=3306,
-> MASTER_LOG_FILE='mysql-bin.000001',
-> MASTER_LOG_POS=1036;
mysql> start slave;
mysql> show slave status\G;
查看状态时注意 I/O 线程和SQL线程 是否为yes
检测免密登录是否成功
在manager服务器上执行 显示都 ok了 基本就可以了。
masterha_check_ssh --conf=/etc/mastermha/app1.cnf
Tue Jul 9 22:38:16 2024 - [debug] Connecting via SSH from root@192.168.64.16(192.168.64.16:22) to root@192.168.64.18(192.168.64.18:22)..
Tue Jul 9 22:38:17 2024 - [debug] ok.
Tue Jul 9 22:38:17 2024 - [debug] Connecting via SSH from root@192.168.64.16(192.168.64.16:22) to root@192.168.64.200(192.168.64.200:22)..
Tue Jul 9 22:38:18 2024 - [debug] ok.
Tue Jul 9 22:38:19 2024 - [debug]
Tue Jul 9 22:38:17 2024 - [debug] Connecting via SSH from root@192.168.64.18(192.168.64.18:22) to root@192.168.64.16(192.168.64.16:22)..
Tue Jul 9 22:38:18 2024 - [debug] ok.
Tue Jul 9 22:38:18 2024 - [debug] Connecting via SSH from root@192.168.64.18(192.168.64.18:22) to root@192.168.64.200(192.168.64.200:22)..
Tue Jul 9 22:38:19 2024 - [debug] ok.
All SSH connection tests passed successfully. #提示这个说明可以了 成功。