IT/TroubleShooting

[TroubleShooting][미해결] NLB를 통한 DB 접속 시 세션 끊김 현상

Ersia 2022. 12. 11. 14:05

NLB를 통한 DB 접속 시 세션 끊김 현상

조금 특이한 구성인데 Database를 AWS NLB로 중계하여 접속할 경우 간혹 세션이 끊기는 현상이 있었다고 한다.
NLB로 DB 접속 시 발생하는 현상이라고 판단하고 DB로 직접 연결해 우회했다고 들었는데,
왜 이런 이슈가 발생했는지 혼자서 이슈를 재현해 보았다.

 

이슈요약

AWS Network Loadbalancer를 통해서 이중화된 DB를 중계할 때 간혹 세션이 끊기는 현상이 발생하였다.

 

이슈 재현 테스트

RDS나 완전 동기화된 DB로 테스트하진 않고 임의의 EC2에 동일한 PostgreSQL을 설치해서
아래의 구성으로 Cloudformation을 구성하고 테스트 해보았다.

Cloudformation 내용

# EC2에서 사용할 keypair 선택
Parameters:
  KeyName:
    Description: Name of an existing EC2 KeyPair to enable SSH access to the instances. Linked to AWS Parameter
    Type: AWS::EC2::KeyPair::KeyName
    ConstraintDescription: must be the name of an existing EC2 KeyPair.

# PostgreSQL 이슈 테스트를 위한 AWS 자원 선언
Resources:
# VPC 선언
  DevPostgreSQLIssueTestVPC:
    Type: AWS::EC2::VPC
    Properties:
     CidrBlock: 10.10.0.0/16
     EnableDnsSupport: true
     EnableDnsHostnames: true
     Tags:
        - Key: Name
          Value: DevPostgreSQLIssueTestVPC
          
# Public Subnet A 선언
  DevPostgreSQLIssueTestPublicSubnetA:
    Type: AWS::EC2::Subnet
    Properties:
      VpcId: !Ref DevPostgreSQLIssueTestVPC
      AvailabilityZone: !Select [ 0, !GetAZs '' ]
      CidrBlock: 10.10.0.0/24
      Tags:
        - Key: Name
          Value: DevPostgreSQLIssueTestPublicSubnetA
          
# Public Subnet C 선언
  DevPostgreSQLIssueTestPublicSubnetC:
    Type: AWS::EC2::Subnet
    Properties:
      VpcId: !Ref DevPostgreSQLIssueTestVPC
      AvailabilityZone: !Select [ 2, !GetAZs '' ]
      CidrBlock: 10.10.1.0/24
      Tags:
        - Key: Name
          Value: DevPostgreSQLIssueTestPublicSubnetC
          
# Private Subnet A 선언
  DevPostgreSQLIssueTestPrivateSubnetA:
    Type: AWS::EC2::Subnet
    Properties:
      VpcId: !Ref DevPostgreSQLIssueTestVPC
      AvailabilityZone: !Select [ 0, !GetAZs '' ]
      CidrBlock: 10.10.10.0/24
      Tags:
        - Key: Name
          Value: DevPostgreSQLIssueTestPrivateSubnetA
          
# Private Subnet C 선언
  DevPostgreSQLIssueTestPrivateSubnetC:
    Type: AWS::EC2::Subnet
    Properties:
      VpcId: !Ref DevPostgreSQLIssueTestVPC
      AvailabilityZone: !Select [ 2, !GetAZs '' ]
      CidrBlock: 10.10.11.0/24
      Tags:
        - Key: Name
          Value: DevPostgreSQLIssueTestPrivateSubnetC

# IGW 선언
  DevPostgreSQLIssueTestIGW:
    Type: AWS::EC2::InternetGateway
    Properties:
      Tags:
        - Key: Name
          Value: DevPostgreSQLIssueTestIGW

# IGW를 VPC에 연결
  DevPostgreSQLIssueTestIGWAttachment:
    Type: AWS::EC2::VPCGatewayAttachment
    Properties:
      InternetGatewayId: !Ref DevPostgreSQLIssueTestIGW
      VpcId: !Ref DevPostgreSQLIssueTestVPC
      
# NAT Gateway를 위한 EIP 선언  
  DevPostgreSQLIssueTestNATEIP:
    Type: AWS::EC2::EIP

# NAT Gateway생성 및 EIP와 Subnet 연결
  DevPostgreSQLIssueTestNATGateway:
    Type: AWS::EC2::NatGateway
    DependsOn:
      - DevPostgreSQLIssueTestIGWAttachment
      - DevPostgreSQLIssueTestPublicSubnetA
      - DevPostgreSQLIssueTestNATEIP
    Properties:
      AllocationId: !GetAtt DevPostgreSQLIssueTestNATEIP.AllocationId
      SubnetId: !Ref DevPostgreSQLIssueTestPublicSubnetA
      Tags:
        - Key: Name
          Value: DevPostgreSQLIssueTestNATGateway

# Public Subnet용 RoutingTable 선언
  DevPostgreSQLIssueTestPublicRT:
    Type: AWS::EC2::RouteTable
    Properties:
      VpcId: !Ref DevPostgreSQLIssueTestVPC
      Tags:
        - Key: Name
          Value: DevPostgreSQLIssueTestPublicRT

# Public RoutingTable의 Default Route 경로(IGW) 추가
  DevPostgreSQLIssueTestPublicDefaultRoute:
    Type: AWS::EC2::Route
    DependsOn: DevPostgreSQLIssueTestIGWAttachment
    Properties:
      RouteTableId: !Ref DevPostgreSQLIssueTestPublicRT
      DestinationCidrBlock: 0.0.0.0/0
      GatewayId: !Ref DevPostgreSQLIssueTestIGW

# Public Subnet에 Public RoutingTable 연결 
  DevPostgreSQLIssueTestPublicSubnetRouteTableAssociationA:
    Type: AWS::EC2::SubnetRouteTableAssociation
    Properties:
      RouteTableId: !Ref DevPostgreSQLIssueTestPublicRT
      SubnetId: !Ref DevPostgreSQLIssueTestPublicSubnetA

  DevPostgreSQLIssueTestPublicSubnetRouteTableAssociationC:
    Type: AWS::EC2::SubnetRouteTableAssociation
    Properties:
      RouteTableId: !Ref DevPostgreSQLIssueTestPublicRT
      SubnetId: !Ref DevPostgreSQLIssueTestPublicSubnetC

# Private Subnet용 RoutingTable 선언
  DevPostgreSQLIssueTestPrivateRT:
    Type: AWS::EC2::RouteTable
    Properties:
      VpcId: !Ref DevPostgreSQLIssueTestVPC
      Tags:
        - Key: Name
          Value: DevPostgreSQLIssueTestPrivateRT

# Private RoutingTable의 Default Route 경로(NatGateway) 추가
  DevPostgreSQLIssueTestPrivateDefaultRoute:
    Type: AWS::EC2::Route
    DependsOn: DevPostgreSQLIssueTestIGWAttachment
    Properties:
      RouteTableId: !Ref DevPostgreSQLIssueTestPrivateRT
      DestinationCidrBlock: 0.0.0.0/0
      NatGatewayId: !Ref DevPostgreSQLIssueTestNATGateway

# Private Subnet에 Private RoutingTable 연결 
  DevPostgreSQLIssueTestPrivateSubnetRouteTableAssociationA:
    Type: AWS::EC2::SubnetRouteTableAssociation
    Properties:
      RouteTableId: !Ref DevPostgreSQLIssueTestPrivateRT
      SubnetId: !Ref DevPostgreSQLIssueTestPrivateSubnetA

  DevPostgreSQLIssueTestPrivateSubnetRouteTableAssociationC:
    Type: AWS::EC2::SubnetRouteTableAssociation
    Properties:
      RouteTableId: !Ref DevPostgreSQLIssueTestPrivateRT
      SubnetId: !Ref DevPostgreSQLIssueTestPrivateSubnetC

# EC2에 적용할 보안그룹 선언
  DevPostgreSQLIssueTestEC2SG:
    Type: AWS::EC2::SecurityGroup
    Properties:
      GroupDescription: Enable HTTP access via port 80 and SSH access via port 22
      VpcId: !Ref DevPostgreSQLIssueTestVPC
      Tags:
        - Key: Name
          Value: DevPostgreSQLIssueTestEC2SG
      SecurityGroupIngress:
      - IpProtocol: tcp
        FromPort: '80'
        ToPort: '80'
        CidrIp: 0.0.0.0/0
      - IpProtocol: tcp
        FromPort: '5432'
        ToPort: '5432'
        CidrIp: 0.0.0.0/0
      - IpProtocol: tcp
        FromPort: '22'
        ToPort: '22'
        CidrIp: 0.0.0.0/0
      - IpProtocol: icmp
        FromPort: -1
        ToPort: -1
        CidrIp: 0.0.0.0/0

# PostgreSQL EC2 A,C 선언
# Bastion에서 접속하기 쉽게 root 패스워드를 변경하고 패스워드 로그인 설정을 활성화한다
# PostgreSQL을 설치하고 외부접속이 가능하도록 설정한다
# NLB 로드밸런싱을 위해 A존과 C존에 각각 생성한다
  DevPostgreSQLIssueTestEC2A:
    Type: AWS::EC2::Instance
    Properties:
      InstanceType: t2.micro
      ImageId: ami-03b42693dc6a7dc35
      KeyName: !Ref KeyName
      Tags:
        - Key: Name
          Value: DevPostgreSQLIssueTestEC2A
      NetworkInterfaces:
        - DeviceIndex: 0
          SubnetId: !Ref DevPostgreSQLIssueTestPrivateSubnetA
          GroupSet:
          - !Ref DevPostgreSQLIssueTestEC2SG
          AssociatePublicIpAddress: false
      UserData:
        Fn::Base64:
          !Sub |
            #!/bin/bash
            hostname DevPostgreSQLIssueTestEC2A
            (
            echo "test1234%"
            echo "test1234%"
            ) | passwd --stdin root
            sed -i "s/^PasswordAuthentication no/PasswordAuthentication yes/g" /etc/ssh/sshd_config
            sed -i "s/^#PermitRootLogin yes/PermitRootLogin yes/g" /etc/ssh/sshd_config
            service sshd restart            
            amazon-linux-extras install epel -y
            tee /etc/yum.repos.d/pgdg.repo<<"EOF"
            [pgdg13]
            name=PostgreSQL 13 for RHEL/CentOS 7 - x86_64
            baseurl=http://download.postgresql.org/pub/repos/yum/13/redhat/rhel-7-x86_64
            enabled=1
            gpgcheck=0
            EOF
            yum install postgresql13 postgresql13-server -y
            /usr/pgsql-13/bin/postgresql-13-setup initdb
            systemctl enable --now postgresql-13
            systemctl status postgresql-13
            su - postgres
            sed -i "s/#listen_addresses = 'localhost'/listen_addresses = '*'/g"  /var/lib/pgsql/13/data/postgresql.conf;
            sed -i "s/#port = 5432/port = 5432/g"  /var/lib/pgsql/13/data/postgresql.conf
            sed -i "s=host    all             all             ::1/128=host    all             all             0.0.0.0/0=g" /var/lib/pgsql/13/data/pg_hba.conf;
            su - postgres <<"EOPostgreSQL"
            psql -U postgres -c "create database testdb;"
            psql -U postgres -c "create user test with encrypted password 'test1234%';"
            psql -U postgres -c "grant all privileges on database testdb to test;"
            EOPostgreSQL
            systemctl restart postgresql-13
     
  DevPostgreSQLIssueTestEC2C:
    Type: AWS::EC2::Instance
    Properties:
      InstanceType: t2.micro
      ImageId: ami-03b42693dc6a7dc35
      KeyName: !Ref KeyName
      Tags:
        - Key: Name
          Value: DevPostgreSQLIssueTestEC2C
      NetworkInterfaces:
        - DeviceIndex: 0
          SubnetId: !Ref DevPostgreSQLIssueTestPrivateSubnetC
          GroupSet:
          - !Ref DevPostgreSQLIssueTestEC2SG
          AssociatePublicIpAddress: false
      UserData:
        Fn::Base64:
          !Sub |
            #!/bin/bash
            hostname DevPostgreSQLIssueTestEC2C
            (
            echo "test1234%"
            echo "test1234%"
            ) | passwd --stdin root
            sed -i "s/^PasswordAuthentication no/PasswordAuthentication yes/g" /etc/ssh/sshd_config
            sed -i "s/^#PermitRootLogin yes/PermitRootLogin yes/g" /etc/ssh/sshd_config
            service sshd restart            
            amazon-linux-extras install epel -y
            tee /etc/yum.repos.d/pgdg.repo<<"EOF"
            [pgdg13]
            name=PostgreSQL 13 for RHEL/CentOS 7 - x86_64
            baseurl=http://download.postgresql.org/pub/repos/yum/13/redhat/rhel-7-x86_64
            enabled=1
            gpgcheck=0
            EOF
            yum install postgresql13 postgresql13-server -y
            /usr/pgsql-13/bin/postgresql-13-setup initdb
            systemctl enable --now postgresql-13
            systemctl status postgresql-13
            su - postgres
            sed -i "s/#listen_addresses = 'localhost'/listen_addresses = '*'/g"  /var/lib/pgsql/13/data/postgresql.conf;
            sed -i "s/#port = 5432/port = 5432/g"  /var/lib/pgsql/13/data/postgresql.conf
            sed -i "s=host    all             all             ::1/128=host    all             all             0.0.0.0/0=g" /var/lib/pgsql/13/data/pg_hba.conf;
            su - postgres <<"EOPostgreSQL"
            psql -U postgres -c "create database testdb;"
            psql -U postgres -c "create user test with encrypted password 'test1234%';"
            psql -U postgres -c "grant all privileges on database testdb to test;"
            EOPostgreSQL
            systemctl restart postgresql-13

# 외부에서 접속하기 위한 Bastion 서버
  DevPostgreSQLIssueTestBastionC:
    Type: AWS::EC2::Instance
    Properties:
      InstanceType: t2.micro
      ImageId: ami-03b42693dc6a7dc35
      KeyName: !Ref KeyName
      Tags:
        - Key: Name
          Value: DevPostgreSQLIssueTestBastionC
      NetworkInterfaces:
        - DeviceIndex: 0
          SubnetId: !Ref DevPostgreSQLIssueTestPublicSubnetC
          GroupSet:
          - !Ref DevPostgreSQLIssueTestEC2SG
          AssociatePublicIpAddress: true
      UserData:
        Fn::Base64:
          !Sub |
            #!/bin/bash
            hostname DevPostgreSQLIssueTestBastionC

# Bastion 서버에 붙일 EIP 선언
  DevPostgreSQLIssueTestBastionCEIP:
    Type: AWS::EC2::EIP
    Properties:
      InstanceId: !Ref DevPostgreSQLIssueTestBastionC
      
# NLB 선언
  DevPostgreSQLIssueTestNLB:
    Type: "AWS::ElasticLoadBalancingV2::LoadBalancer"
    Properties:
      Type: "network"
      Scheme: "internet-facing"
      IpAddressType: "ipv4"
      SubnetMappings:
        - SubnetId: !Ref DevPostgreSQLIssueTestPublicSubnetA
        - SubnetId: !Ref DevPostgreSQLIssueTestPublicSubnetC
      LoadBalancerAttributes:
        - Key: "deletion_protection.enabled"
          Value: false
        - Key: "access_logs.s3.enabled"
          Value: false
        - Key: "load_balancing.cross_zone.enabled"
          Value: true
      Tags:
        - Key: Name
          Value: DevPostgreSQLIssueTestNLB
          
# NLB에 연결할 대상 그룹 선언
  DevPostgreSQLIssueTestNLBTargetGroup:
    Type: AWS::ElasticLoadBalancingV2::TargetGroup
    Properties:
      Port: 5432
      Protocol: TCP
      Targets:
        - Id: !Ref DevPostgreSQLIssueTestEC2A
        - Id: !Ref DevPostgreSQLIssueTestEC2C
      TargetType: instance
      VpcId: !Ref DevPostgreSQLIssueTestVPC

# NLB Listner 설정 및 중계할 대상 설정
  DevPostgreSQLIssueTestNLBListener1:
    Type: "AWS::ElasticLoadBalancingV2::Listener"
    Properties:
      LoadBalancerArn: !Ref DevPostgreSQLIssueTestNLB
      Protocol: "TCP"
      Port: 80
      DefaultActions:
        - Type: "forward"
          ForwardConfig:
            TargetGroups:
              - TargetGroupArn: !Ref DevPostgreSQLIssueTestNLBTargetGroup

# 출력
Outputs:  
  DevPostgreSQLIssueTestEC2AIPAddress:
    Value: !GetAtt DevPostgreSQLIssueTestEC2A.PrivateIp
    Export:
      Name: "DevPostgreSQLIssueTestEC2AIP::Address"  
  DevPostgreSQLIssueTestBastionCEIPAddress:
    Value: !Ref DevPostgreSQLIssueTestBastionCEIP
    Export:
      Name: "DevPostgreSQLIssueTestBastionCEIP::Address"
  DevPostgreSQLIssueTestNLBDNSName:
    Value: !GetAtt DevPostgreSQLIssueTestNLB.DNSName
    Export:
      Name: "DevPostgreSQLIssueTestNLBDNSName::DNSName"

정상적인 접속 세션

구성완료 후 접속 테스트 시 세션이 끊어질 만한 현상이나 특이점이 확인되지 않았다.

이슈를 재현하기 위해 여러 설정을 확인하던 중 NLB TargetGroup의  Cross-zone load balancing 설정(NLB는 기본 비활성화)이 있어 활성화 후 테스트를 해보았다.

해당 설정으로 인해 하나의 프로그램에서 접속한 세션이 A와 C존의 DB에 각각 따로 접속되었다.

 

원인파악 상세 (추측)

정확한 이슈는 재현되지 않아 원인파악이 어려운 상황이다.
추측하기로는 해당 TargetGroup의 Cross-zone 설정이 활성화되어 부모세션과 자식 세션이 나뉘어서 로드밸런싱이 되지 않았을까 추측한다.
DBeaver의 경우 메타데이터와 쿼리 조회를 위한 세션이기에 따로 접속되어도 접속에는 문제가 없었으나, 각 세션이 부모자식관계에 있거나 하는 경우 세션이 끊기는 현상이 있을 수도 있을 것 같다.

 

해결방법 (추측)

TargetGroup의 Cross-zone 설정 활성화가 원인으로 예상되나 테스트 시 이슈가 완벽하게 재현되지 않았고,
현재는 NLB 중계를 제거하여 이슈가 발생하지 않는다고 하니 정확한 해결방법은 확인이 어렵다.
(이슈가 발생했을 때 NLB를 제외한 Direct Connection 시에는 이슈가 발생하지 않아, NLB 문제로 판단하고 NLB 중계를 제거한 것 같은데 몇몇 NLB설정을 테스트해봤으면 더 좋았을 것 같다.)

 

 

참고사이트