728x90
반응형
RHOSP13 Director에서 overcloud 노드가 증가됨에 따라 배포시 websocket timeout이 발생했다..
1. 배포 이슈
websocket timeout이 발생하면서 배포 스크립트는 에러 상태로 빠지지만, 운이 좋게? heat-engine은 동작하면서 배포는 진행이 되긴 했다.
2. 원인
- zaqar로그에서 websocket connection timeout이 6분안에 closed된 로그가 있음
(INFO zaqar.transport.websocket.protocol [-] WebSocket connection closed: None)
- 6분안에 State 상태가 Success가 되지 않아 timeout이 발생 (Success까지 약 10분 필요. 즉, 10분동안 Running 상태)
- RHOSP13에서는 timeout값이 하드코딩되어있어서 설정파일을 통해 변경할 수 없어 강제로 소스코드를 수정해서 해결함
[stack@director ~]$ ./deploy.sh
Started Mistral Workflow tripleo.validations.v1.check_pre_deployment_validations. Execution ID: 376e8664-b114-43ae-a63b-a60958fc3ee3
Waiting for messages on queue 'tripleo' with no timeout.
Removing the current plan files
Uploading new plan files
Started Mistral Workflow tripleo.plan_management.v1.update_deployment_plan. Execution ID: 58365e27-ab09-4605-89b3-68aa843a19fa
Plan updated.
Processing templates in the directory /tmp/tripleoclient-q2MVPu/tripleo-heat-templates
Started Mistral Workflow tripleo.plan_management.v1.get_deprecated_parameters. Execution ID: 298b61ff-d819-483e-8d7a-e8f0b7ea02c6
WARNING: Following parameters are defined but not used in plan. Could be possible that parameter is valid but currently not used.
DockerAodhConfigImage
DockerMistralExecutorImage
< 중략 >
DockerSwiftConfigImage
DockerNeutronL3AgentImage
Deploying templates in the directory /tmp/tripleoclient-q2MVPu/tripleo-heat-templates
Started Mistral Workflow tripleo.deployment.v1.deploy_plan. Execution ID: 82b76ab9-56b7-463a-b0bd-299bf52de893
Workflow Started
Timed out waiting for messages from Execution (ID: 82b76ab9-56b7-463a-b0bd-299bf52de893, State: RUNNING). The WebSocket timed out before the Workflow completed.
[stack@director ~]$
## 위 websocket 에러 발생 당시에는 Running 상태로 확인됨
Every 10.0s: mistral execution-get 82b76ab9-56b7-463a-b0bd-299bf52de893 Thu Apr 22 16:48:22 2021
+--------------------+--------------------------------------+
| Field | Value |
+--------------------+--------------------------------------+
| ID | 82b76ab9-56b7-463a-b0bd-299bf52de893 |
| Workflow ID | d4b62d73-0448-4995-9b93-ef5c0c05bb33 |
| Workflow name | tripleo.deployment.v1.deploy_plan |
| Workflow namespace | |
| Description | |
| Task Execution ID | <none> |
| State | RUNNING |
| State info | None |
| Created at | 2021-04-22 07:40:53 |
| Updated at | 2021-04-22 07:40:53 |
+--------------------+--------------------------------------+
약 10분이 나니 상태가 SUCCESS로 변경됨
Every 10.0s: mistral execution-get 82b76ab9-56b7-463a-b0bd-299bf52de893 Thu Apr 22 16:50:47 2021
+--------------------+--------------------------------------+
| Field | Value |
+--------------------+--------------------------------------+
| ID | 82b76ab9-56b7-463a-b0bd-299bf52de893 |
| Workflow ID | d4b62d73-0448-4995-9b93-ef5c0c05bb33 |
| Workflow name | tripleo.deployment.v1.deploy_plan |
| Workflow namespace | |
| Description | |
| Task Execution ID | <none> |
| State | SUCCESS |
| State info | None |
| Created at | 2021-04-22 07:40:53 |
| Updated at | 2021-04-22 07:50:31 |
+--------------------+--------------------------------------+
3. 해결방법 (workaround)
아래 해결방법은 RedHat에서 권장하는 방법은 아니다.
참고로, RHOSP16.1에서는 해당 값이 1200으로 변경되어 있음. 해서 이 값을 1200으로 변경함
base.wati_for_messages의 값을 360 ->1200으로 변경
## 설정 변경 파일
[root@undercloud-0 workflows]# pwd
/usr/lib/python2.7/site-packages/tripleoclient/workflows
[root@undercloud-0 workflows]# vi deployment.py
## 변경 전
...
# The deploy workflow ends once the Heat create/update starts. This
# means that is shouldn't take very long. Wait for 10 minutes for
# messages from the workflow.
for payload in base.wait_for_messages(workflow_client, ws, execution, <------
360):
status = payload.get('status', 'RUNNING')
if 'message' in payload and status == "RUNNING":
print(payload['message'])
if payload['status'] != "SUCCESS":
pprint.pformat(payload)
raise ValueError("Unexpected status %s for %s"
% (payload['status'], wf_name))
## 변경 후
# The deploy workflow ends once the Heat create/update starts. This
# means that is shouldn't take very long. Wait for 10 minutes for
# messages from the workflow.
for payload in base.wait_for_messages(workflow_client, ws, execution,
1200): # <----------------------
status = payload.get('status', 'RUNNING')
if 'message' in payload and status == "RUNNING":
print(payload['message'])
if payload['status'] != "SUCCESS":
pprint.pformat(payload)
raise ValueError("Unexpected status %s for %s"
% (payload['status'], wf_name))
728x90
반응형