案例分享 生产环境逐步迁移至k8s集群 - pod注册到consul

乎语百科 295 0

#案例分享 生产环境逐步迁移至k8s集群 - pod注册到consul

#项目背景

  • 多套业务系统, 所有节点注册到consul集群,方便统一管理
  • 使用consul的dns功能, 所有节点hostname能ping通
  • 使用consul健康检查功能, 健康检查通过才添加到service
  • 部分服务之前调用直接使用consul的server地址即: service-name.service.datacenter.consul
  • prometheus监控使用consul-templates自动添加节点
  • 运行环境是阿里云, k8s集群容器IP和云主机IP互通

#1.1 需要解决的问题

  • 部分服务迁移k8s集群后, k8s集群外的服务需要直连pod的ip访问

#1.2 解决办法

  • pod添加consul-agent容器注册到consul集群

#2.1 pod注册到consul产生的新问题

  • pod退出或删除时, consul集群应删除pod
  • prometheus监控模板consul-templates需要排除pod

#2.2 解决办法

  • consul容器使用preStop钩子, 退出前执行consul leave主动离开consul集群
  • consul-templates排除pod
    • pod注册到consul集群时添加前缀如k8s-
    • consul-templates使用regexMatch正则匹配忽略k8s-开头的节点

#演示demo如下

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: consul-demo-config
  namespace: default
data:
  consul.json: |-

    {
      "datacenter": "qa",
      "acl_datacenter": "qa",
      "data_dir": "/tmp/consul",
      "bind_addr": "0.0.0.0",
      "client_addr": "0.0.0.0",
      "start_join": ["10.10.100.100"],
      "retry_join": ["10.10.100.100"],
      "retry_interval": "5s",
      "disable_host_node_id": true,
      "enable_script_checks": true,
      "disable_update_check": true,
      "leave_on_terminate": true,
      "log_level": "WARN",
      "server": false,
      "service": {
        "name": "qa-consul-demo",
        "port" : 80,
        "tags": ["k8s", "qa", "consul-demo"],
        "checks": [
          {
            "id": "consul-demo-HealthCheck",
            "name": "Health Check",
            "notes": "Health Check",
            "args": [ "sh", "-c", "[ $(curl -s 127.0.0.1 -I |grep 'nginx' |wc -l) -eq 1 ] && { echo 'Health check successful'; exit 0 ; } || { echo 'check error' ; exit 2 ; }" ],
            "interval": "10s"
          }
        ]
      }
    }

---

apiVersion: apps/v1
kind: Deployment
metadata:
  name: consul-demo
  namespace: default
spec:
  selector:
    matchLabels:
      app: consul-demo
  replicas: 2
  template:
    metadata:
      labels:
        app: consul-demo
    spec:

      imagePullSecrets:
      - name: docker-image-key

      containers:
      - name: consul-agent
        image: consul:1.0.8
        imagePullPolicy: IfNotPresent
        command:
        - sh
        - -c
        - |
          consul agent -config-dir=/opt/consul -node=k8s-qa-$HOSTNAME -rejoin
        lifecycle:
          preStop:
            exec:
              command:
              - sh
              - -c
              - |
                consul leave
        volumeMounts:
        - mountPath: "/etc/consul"
          name: consul-conf
        resources:
          requests:
            cpu: 10m
            memory: 16Mi
          limits:
            cpu: 50m
            memory: 32Mi
        readinessProbe:
          tcpSocket:
            port: 8500
        livenessProbe:
          tcpSocket:
            port: 8500
        volumeMounts:
        - name: consul-config
          mountPath: "/opt/consul"

      - name: nginx-node
        image: alivv/nginx:node
        imagePullPolicy: IfNotPresent

      volumes:
      - name: consul-config
        configMap:
          name: consul-demo-config
          items:
          - key: consul.json
            path: consul.json


监控模板consul-templates如下

  - job_name: 'node'
    static_configs:
{{range nodes}}
      - targets: ['{{.Node}}:9100']
        labels:
          instance: {{.Node}}{{end}}

修改后如下, 使用regexMatch正则匹配排除k8s-开头的节点名称

  - job_name: 'node'
    static_configs:
{{range nodes}}{{if .Node | regexMatch "^k8s-.*" }}{{else}}
      - targets: ['{{.Node}}:9100']
        labels:
          instance: {{.Node}}{{end}}{{end}}

标签:

留言评论

  • 这篇文章还没有收到评论,赶紧来抢沙发吧~