본문으로 바로가기
반응형

RHOSP13 Director에서 overcloud 노드가 증가됨에 따라 배포시 websocket timeout이 발생했다..

 

1. 배포 이슈

websocket timeout이 발생하면서 배포 스크립트는 에러 상태로 빠지지만, 운이 좋게? heat-engine은 동작하면서 배포는 진행이 되긴 했다.

 

 

2. 원인

 - zaqar로그에서 websocket connection timeout이 6분안에 closed된 로그가 있음

 (INFO zaqar.transport.websocket.protocol [-] WebSocket connection closed: None)

 - 6분안에 State 상태가 Success가 되지 않아 timeout이 발생 (Success까지 약 10분 필요. 즉, 10분동안 Running 상태)

 - RHOSP13에서는 timeout값이 하드코딩되어있어서 설정파일을 통해 변경할 수 없어 강제로 소스코드를 수정해서 해결함

[stack@director ~]$ ./deploy.sh
Started Mistral Workflow tripleo.validations.v1.check_pre_deployment_validations. Execution ID: 376e8664-b114-43ae-a63b-a60958fc3ee3
Waiting for messages on queue 'tripleo' with no timeout.
Removing the current plan files
Uploading new plan files
Started Mistral Workflow tripleo.plan_management.v1.update_deployment_plan. Execution ID: 58365e27-ab09-4605-89b3-68aa843a19fa
Plan updated.
Processing templates in the directory /tmp/tripleoclient-q2MVPu/tripleo-heat-templates
Started Mistral Workflow tripleo.plan_management.v1.get_deprecated_parameters. Execution ID: 298b61ff-d819-483e-8d7a-e8f0b7ea02c6
WARNING: Following parameters are defined but not used in plan. Could be possible that parameter is valid but currently not used.
  DockerAodhConfigImage
  DockerMistralExecutorImage
  < 중략 >
  DockerSwiftConfigImage
  DockerNeutronL3AgentImage
Deploying templates in the directory /tmp/tripleoclient-q2MVPu/tripleo-heat-templates
Started Mistral Workflow tripleo.deployment.v1.deploy_plan. Execution ID: 82b76ab9-56b7-463a-b0bd-299bf52de893
Workflow Started
Timed out waiting for messages from Execution (ID: 82b76ab9-56b7-463a-b0bd-299bf52de893, State: RUNNING). The WebSocket timed out before the Workflow completed.

[stack@director ~]$
## 위 websocket 에러 발생 당시에는 Running 상태로 확인됨


Every 10.0s: mistral execution-get 82b76ab9-56b7-463a-b0bd-299bf52de893                                                                           Thu Apr 22 16:48:22 2021

+--------------------+--------------------------------------+
| Field              | Value                                |
+--------------------+--------------------------------------+
| ID                 | 82b76ab9-56b7-463a-b0bd-299bf52de893 |
| Workflow ID        | d4b62d73-0448-4995-9b93-ef5c0c05bb33 |
| Workflow name      | tripleo.deployment.v1.deploy_plan    |
| Workflow namespace |                                      |
| Description        |                                      |
| Task Execution ID  | <none>                               |
| State              | RUNNING                              |
| State info         | None                                 |
| Created at         | 2021-04-22 07:40:53                  |
| Updated at         | 2021-04-22 07:40:53                  |
+--------------------+--------------------------------------+


약 10분이 나니 상태가 SUCCESS로 변경됨


Every 10.0s: mistral execution-get 82b76ab9-56b7-463a-b0bd-299bf52de893                                                                           Thu Apr 22 16:50:47 2021

+--------------------+--------------------------------------+
| Field              | Value                                |
+--------------------+--------------------------------------+
| ID                 | 82b76ab9-56b7-463a-b0bd-299bf52de893 |
| Workflow ID        | d4b62d73-0448-4995-9b93-ef5c0c05bb33 |
| Workflow name      | tripleo.deployment.v1.deploy_plan    |
| Workflow namespace |                                      |
| Description        |                                      |
| Task Execution ID  | <none>                               |
| State              | SUCCESS                              |
| State info         | None                                 |
| Created at         | 2021-04-22 07:40:53                  |
| Updated at         | 2021-04-22 07:50:31                  |
+--------------------+--------------------------------------+

 

 

3. 해결방법 (workaround)

아래 해결방법은 RedHat에서 권장하는 방법은 아니다. 

참고로, RHOSP16.1에서는 해당 값이 1200으로 변경되어 있음. 해서 이 값을 1200으로 변경함

base.wati_for_messages의 값을 360 ->1200으로 변경

## 설정 변경 파일
[root@undercloud-0 workflows]# pwd
/usr/lib/python2.7/site-packages/tripleoclient/workflows
[root@undercloud-0 workflows]# vi deployment.py


## 변경 전
...
        # The deploy workflow ends once the Heat create/update starts. This
        # means that is shouldn't take very long. Wait for 10 minutes for
        # messages from the workflow.
        for payload in base.wait_for_messages(workflow_client, ws, execution,  <------
                                              360):
            status = payload.get('status', 'RUNNING')
            if 'message' in payload and status == "RUNNING":
                print(payload['message'])

        if payload['status'] != "SUCCESS":
            pprint.pformat(payload)
            raise ValueError("Unexpected status %s for %s"
                             % (payload['status'], wf_name))


## 변경 후
        # The deploy workflow ends once the Heat create/update starts. This
        # means that is shouldn't take very long. Wait for 10 minutes for
        # messages from the workflow.
        for payload in base.wait_for_messages(workflow_client, ws, execution,  
                                              1200):      # <----------------------
            status = payload.get('status', 'RUNNING')
            if 'message' in payload and status == "RUNNING":
                print(payload['message'])

        if payload['status'] != "SUCCESS":
            pprint.pformat(payload)
            raise ValueError("Unexpected status %s for %s"
                             % (payload['status'], wf_name))
728x90