chain getting halted when witness process is restarted multiple times before completely syncing with the latest block #316

Closed
opened 2022-03-10 18:53:12 +00:00 by prandnum · 9 comments
prandnum commented 2022-03-10 18:53:12 +00:00 (Migrated from gitlab.com)

chain getting halted when the witness process is restarted multiple times before completely syncing with the latest block. Once the blockchain(mainnet) is completely synced, multiple restarts of the witness node do not cause any issue.

2741761ms th_a fork_database.cpp:65 push_block ] Pushing block to fork database that failed to link: 01243493a950eb9951f42ab41ef0b6a6209e9271, 19149971
2741761ms th_a fork_database.cpp:66 push_block ] Head: 19149964, 0124348c18de432a54bfe188b741065feb946aa5
2741761ms th_a application.cpp:487 handle_block ] Error when pushing block:
3080000 unlinkable_block_exception: unlinkable block
block does not link to known chain
{}
th_a fork_database.cpp:86 _push_block

{"new_block":{"previous":"01243492a53f7c31e8370b5ed2c33ad7cccda6fb","timestamp":"2019-04-07T09:23:39","witness":"1.6.38","next_secret_hash":"1dc0ce9f75343b96861d3ad88015f2dcbe52afea","previous_secret":"b25128e0da934f8c7799930f27bd24067db8c176","transaction_merkle_root":"0000000000000000000000000000000000000000","extensions":[],"witness_signature":"1f05ef07108f72586b30fba1661e6549dca110ce792ddffde8b41708ba8e8de44b787f164847ee0124235eddc8d016d7167cd74ea48970349655079386ae7afd42","transactions":[]}}
th_a db_block.cpp:293 _push_block
2741761ms th_a fork_database.cpp:65 push_block ] Pushing block to fork database that failed to link: 0124349457d26cc54fd082e36c3af00b0f29876a, 19149972
2741761ms th_a fork_database.cpp:66 push_block ] Head: 19149964, 0124348c18de432a54bfe188b741065feb946aa5
2741762ms th_a application.cpp:487 handle_block ] Error when pushing block:
3080000 unlinkable_block_exception: unlinkable block
block does not link to known chain
{}
th_a fork_database.cpp:86 _push_block

{"new_block":{"previous":"01243493a950eb9951f42ab41ef0b6a6209e9271","timestamp":"2019-04-07T09:23:42","witness":"1.6.16","next_secret_hash":"61ecbe714f990a2ecc52201f0fc0c80e103f83a9","previous_secret":"438f222546c35a58ad46a25b6b3ad3b576fcfe52","transaction_merkle_root":"0000000000000000000000000000000000000000","extensions":[],"witness_signature":"207366daa1397a94af055ca61e3aa9d26d2fa1770dbba236de80cf37faf1a39a7d713c2b8f01a6cbd734f1e272fe16aff445f797f69689c8b0b430b588b4fd472c","transactions":[]}}
th_a db_block.cpp:293 _push_block
2747010ms asio main.cpp:168 operator() ] Caught SIGINT attempting to exit cleanly
2747011ms th_a main.cpp:181 main ] Exiting from signal 2
qa@PBSA-Dev:~/02032022/src/peerplays$

Steps

  1. Ran the witness_node process on terminal(not daemon or service) i.e. ./witness_node and did a Ctrl+C to kill the process
  2. Restarted the witness process 14 times and on 15th restart this issue was seen.

First Try:
logs.tar.gz

Second Try:
04032022-logs.tar.gz

terminal-logs.zip

chain getting halted when the witness process is restarted multiple times before completely syncing with the latest block. Once the blockchain(mainnet) is completely synced, multiple restarts of the witness node do not cause any issue. ``` 2741761ms th_a fork_database.cpp:65 push_block ] Pushing block to fork database that failed to link: 01243493a950eb9951f42ab41ef0b6a6209e9271, 19149971 2741761ms th_a fork_database.cpp:66 push_block ] Head: 19149964, 0124348c18de432a54bfe188b741065feb946aa5 2741761ms th_a application.cpp:487 handle_block ] Error when pushing block: 3080000 unlinkable_block_exception: unlinkable block block does not link to known chain {} th_a fork_database.cpp:86 _push_block {"new_block":{"previous":"01243492a53f7c31e8370b5ed2c33ad7cccda6fb","timestamp":"2019-04-07T09:23:39","witness":"1.6.38","next_secret_hash":"1dc0ce9f75343b96861d3ad88015f2dcbe52afea","previous_secret":"b25128e0da934f8c7799930f27bd24067db8c176","transaction_merkle_root":"0000000000000000000000000000000000000000","extensions":[],"witness_signature":"1f05ef07108f72586b30fba1661e6549dca110ce792ddffde8b41708ba8e8de44b787f164847ee0124235eddc8d016d7167cd74ea48970349655079386ae7afd42","transactions":[]}} th_a db_block.cpp:293 _push_block 2741761ms th_a fork_database.cpp:65 push_block ] Pushing block to fork database that failed to link: 0124349457d26cc54fd082e36c3af00b0f29876a, 19149972 2741761ms th_a fork_database.cpp:66 push_block ] Head: 19149964, 0124348c18de432a54bfe188b741065feb946aa5 2741762ms th_a application.cpp:487 handle_block ] Error when pushing block: 3080000 unlinkable_block_exception: unlinkable block block does not link to known chain {} th_a fork_database.cpp:86 _push_block {"new_block":{"previous":"01243493a950eb9951f42ab41ef0b6a6209e9271","timestamp":"2019-04-07T09:23:42","witness":"1.6.16","next_secret_hash":"61ecbe714f990a2ecc52201f0fc0c80e103f83a9","previous_secret":"438f222546c35a58ad46a25b6b3ad3b576fcfe52","transaction_merkle_root":"0000000000000000000000000000000000000000","extensions":[],"witness_signature":"207366daa1397a94af055ca61e3aa9d26d2fa1770dbba236de80cf37faf1a39a7d713c2b8f01a6cbd734f1e272fe16aff445f797f69689c8b0b430b588b4fd472c","transactions":[]}} th_a db_block.cpp:293 _push_block 2747010ms asio main.cpp:168 operator() ] Caught SIGINT attempting to exit cleanly 2747011ms th_a main.cpp:181 main ] Exiting from signal 2 qa@PBSA-Dev:~/02032022/src/peerplays$ ``` Steps 1. Ran the witness_node process on terminal(not daemon or service) i.e. ./witness_node and did a Ctrl+C to kill the process 2. Restarted the witness process 14 times and on 15th restart this issue was seen. First Try: [logs.tar.gz](/uploads/02d3bfafb20706f7968b0521c8351926/logs.tar.gz) Second Try: [04032022-logs.tar.gz](/uploads/0be069482634d1d373908e6fa13e0f16/04032022-logs.tar.gz) [terminal-logs.zip](/uploads/b183c9a7557aa8e4cf76d8964e8c4069/terminal-logs.zip)
prandnum commented 2022-03-10 18:53:55 +00:00 (Migrated from gitlab.com)

@bobinson @serkixenos @vampik

CC: @hbelakon

@bobinson @serkixenos @vampik CC: @hbelakon
prandnum commented 2022-03-10 18:56:26 +00:00 (Migrated from gitlab.com)

mentioned in issue #235

mentioned in issue #235
bobinson commented 2022-03-11 13:38:33 +00:00 (Migrated from gitlab.com)

@prandnum - So originally we had an issue of not able to resume the chain after the full sync and now that is addressed but we have an issue that will cause corruption of the db / halting if the sync process is stopped before complete sync ?

@prandnum - So originally we had an issue of not able to resume the chain after the full sync and now that is addressed but we have an issue that will cause corruption of the db / halting if the sync process is stopped before complete sync ?
prandnum commented 2022-03-11 18:54:02 +00:00 (Migrated from gitlab.com)

Yes that is correct.

Yes that is correct.
vampik commented 2022-03-11 19:10:17 +00:00 (Migrated from gitlab.com)

Now we don't have corruption in database.

We have change data we write to db for several objects, that cause corruption in the first place. And now we have an error with forked db (unlinkable block) if we don't sync completely.

Like this:

1263453ms th_a fork_database.cpp:65 push_block ] Pushing block to fork database that failed to link: 01e3b25d2e257bf4730d60d1aed1703ed267a92d, 31699549
1263453ms th_a fork_database.cpp:66 push_block ] Head: 31699546, 01e3b25a1eb1e3a47e2193215833f0c0d159d30c
1263453ms th_a application.cpp:487 handle_block ] Error when pushing block:
3080000 unlinkable_block_exception: unlinkable block
block does not link to known chain
{}
th_a fork_database.cpp:86 _push_block

{"new_block":{"previous":"01e3b25c3af3687a9ad565782fcddae76a92d435","timestamp":"2020-07-01T20:52:15","witness":"1.6.13","next_secret_hash":"7157edc4fb1f57868f2bfaa35e4a481e0a42b614","previous_secret":"44e1ef55c0b1117de6c259e3571bb649d36ab189","transaction_merkle_root":"0000000000000000000000000000000000000000","extensions":[],"witness_signature":"1f6ff8fdc6b61e3e68e6250794b3e38c0e1b49d0570cdbb13c26b2d39f241be8f2232dc809781c7d650671893539f7e4597588cf8a890a8dcfa03aeb7118853a6c","transactions":[]}}
th_a db_block.cpp:293 _push_block
Now we don't have corruption in database. We have change data we write to db for several objects, that cause corruption in the first place. And now we have an error with forked db (unlinkable block) if we don't sync completely. Like this: ``` 1263453ms th_a fork_database.cpp:65 push_block ] Pushing block to fork database that failed to link: 01e3b25d2e257bf4730d60d1aed1703ed267a92d, 31699549 1263453ms th_a fork_database.cpp:66 push_block ] Head: 31699546, 01e3b25a1eb1e3a47e2193215833f0c0d159d30c 1263453ms th_a application.cpp:487 handle_block ] Error when pushing block: 3080000 unlinkable_block_exception: unlinkable block block does not link to known chain {} th_a fork_database.cpp:86 _push_block {"new_block":{"previous":"01e3b25c3af3687a9ad565782fcddae76a92d435","timestamp":"2020-07-01T20:52:15","witness":"1.6.13","next_secret_hash":"7157edc4fb1f57868f2bfaa35e4a481e0a42b614","previous_secret":"44e1ef55c0b1117de6c259e3571bb649d36ab189","transaction_merkle_root":"0000000000000000000000000000000000000000","extensions":[],"witness_signature":"1f6ff8fdc6b61e3e68e6250794b3e38c0e1b49d0570cdbb13c26b2d39f241be8f2232dc809781c7d650671893539f7e4597588cf8a890a8dcfa03aeb7118853a6c","transactions":[]}} th_a db_block.cpp:293 _push_block ```
serkixenos commented 2022-03-11 20:19:39 +00:00 (Migrated from gitlab.com)

Forking means that DBs on remote and local nodes are not exactly the same.

Given that database is synced properly BEFORE the software is interrupted, we know that databases ARE exactly the same while syncing. So, looks like there is something on software start that corrupts the database.

Adding logs to push_block, in order to investigate block numbers that are being processed is a good way to start. Lets check the number of last block processed on CTRL+C, and first block processed on software start. Especially when CTRL+C causes undoing few blocks (when this happens, you will see a lot of messages in a witness log).

Forking means that DBs on remote and local nodes are not exactly the same. Given that database is synced properly BEFORE the software is interrupted, we know that databases ARE exactly the same while syncing. So, looks like there is something on software start that corrupts the database. Adding logs to push_block, in order to investigate block numbers that are being processed is a good way to start. Lets check the number of last block processed on CTRL+C, and first block processed on software start. Especially when CTRL+C causes undoing few blocks (when this happens, you will see a lot of messages in a witness log).
hirunda commented 2022-05-16 15:08:18 +00:00 (Migrated from gitlab.com)

assigned to @hirunda

assigned to @hirunda
prandnum commented 2022-06-03 19:44:30 +00:00 (Migrated from gitlab.com)

@hirunda please set the state into testing so that I can close the bug.

CC: @serkixenos

@hirunda please set the state into testing so that I can close the bug. CC: @serkixenos
prandnum commented 2022-06-06 17:26:25 +00:00 (Migrated from gitlab.com)

working without any issues.

working without any issues.
prandnum (Migrated from gitlab.com) closed this issue 2022-06-06 17:26:26 +00:00
Sign in to join this conversation.
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: Peerplays_Blockchain/peerplays_migrated#316
No description provided.