Witness node crashes and not coming back even with --replay #268

Closed
opened 2022-02-02 20:37:23 +00:00 by bgerze · 8 comments
bgerze commented 2022-02-02 20:37:23 +00:00 (Migrated from gitlab.com)

Hello @serkixenos, @rilesdun
While I was testing SON operation sending from baris12 account to sonaccount09 TBD ( test hive ), suddenly witness_node not responding and try to send transaction to hive node and 8090 port was not replying. This was interesting coz my dev system was working stable last 1-1.5 month without even rebooting. The interesting thing was even witness_node was sending transaction to hive repeatedly, it was not listening port 8090, I restarted witness_node and it is not listening properly port 8090. Lets assume something wrong with hived even witness_node should tell me something I cannot connect to hived right now. Later @rilesdun came to my system and reinit the chain.

Here is log message:

abase.cpp","line":1187,"method":"push_transaction","hostname":"","timestamp":"2022-02-
01T19:24:10"},"format":"","data":{"trx":{"ref_block_num":32198,"ref_block_prefix":4278709794,"expiration":"2022-02-01T10:40:27","operations":[{"type":"transfer_operation","value":{"from":"son-account","to":"sonaccount09","amount":{"amount":"1000","precision":3,"nai":"@@000000013"},"memo":""}}],"extensions":[],"signatures":["205b80c7d844bd2c9a1fad0e734b0948907dca50c5ca79f6ea4766fcfab47071ec7201efd2a05911cfa33552194d94188b3b8e19e6279c8dad9edfab53dcdd74e4","207cc5ad1ddfe47e35a73b19f06b581da6b810651f176cc345dad7ca98daaf73394e4fc2fab3d858b35161226434df9ed79d0eda656358f2524e7fcc2fa6348549","1f56dcb35978ba6ec97b9ab48f552f288f81b304c04e42c7b8e940a218c3f8f1d314f2958821f08a0eed801ed0307db88febfd5426860d958ec4ec327b1698b7d9","203ff3f302cc327309f44b6990f5c336a907e4ae0ab1d24be3c04619ac4545630544f73aeb7f7535b54f35917f9bdb6141e3fa5efc316f21eaeadd05457bfa8049","20234d5f3b400726af576b3c1d8ee95d5fac47aef6675259c380a76a5ab60a95b71076df5c40b87e40f347e122b44a2fd07156b27f74ebd4118655ff70f69cbec8"]}}}]}},"id":773} 1450087ms th_a sidechain_net_handler.cpp:545 operator() ] Sending proposal for sidechain transaction send operation failed with exception Assert Exception 1450087ms th_a sidechain_net_handler.cpp:524 operator() ] Sidechain transaction to send: 1.39.25

So far I cannot give you more details to help.. all we can do send wrong transaction to hive over SON peerplays functionality and close hived to reproduce error.

Hello @serkixenos, @rilesdun While I was testing SON operation sending from baris12 account to sonaccount09 TBD ( test hive ), suddenly witness_node not responding and try to send transaction to hive node and 8090 port was not replying. This was interesting coz my dev system was working stable last 1-1.5 month without even rebooting. The interesting thing was even witness_node was sending transaction to hive repeatedly, it was not listening port 8090, I restarted witness_node and it is not listening properly port 8090. Lets assume something wrong with hived even witness_node should tell me something I cannot connect to hived right now. Later @rilesdun came to my system and reinit the chain. Here is log message: abase.cpp","line":1187,"method":"push_transaction","hostname":"","timestamp":"2022-02- 01T19:24:10"},"format":"","data":{"trx":{"ref_block_num":32198,"ref_block_prefix":4278709794,"expiration":"2022-02-01T10:40:27","operations":[{"type":"transfer_operation","value":{"from":"son-account","to":"sonaccount09","amount":{"amount":"1000","precision":3,"nai":"@@000000013"},"memo":""}}],"extensions":[],"signatures":["205b80c7d844bd2c9a1fad0e734b0948907dca50c5ca79f6ea4766fcfab47071ec7201efd2a05911cfa33552194d94188b3b8e19e6279c8dad9edfab53dcdd74e4","207cc5ad1ddfe47e35a73b19f06b581da6b810651f176cc345dad7ca98daaf73394e4fc2fab3d858b35161226434df9ed79d0eda656358f2524e7fcc2fa6348549","1f56dcb35978ba6ec97b9ab48f552f288f81b304c04e42c7b8e940a218c3f8f1d314f2958821f08a0eed801ed0307db88febfd5426860d958ec4ec327b1698b7d9","203ff3f302cc327309f44b6990f5c336a907e4ae0ab1d24be3c04619ac4545630544f73aeb7f7535b54f35917f9bdb6141e3fa5efc316f21eaeadd05457bfa8049","20234d5f3b400726af576b3c1d8ee95d5fac47aef6675259c380a76a5ab60a95b71076df5c40b87e40f347e122b44a2fd07156b27f74ebd4118655ff70f69cbec8"]}}}]}},"id":773} 1450087ms th_a sidechain_net_handler.cpp:545 operator() ] Sending proposal for sidechain transaction send operation failed with exception Assert Exception 1450087ms th_a sidechain_net_handler.cpp:524 operator() ] Sidechain transaction to send: 1.39.25 So far I cannot give you more details to help.. all we can do send wrong transaction to hive over SON peerplays functionality and close hived to reproduce error.
bgerze commented 2022-02-02 20:37:24 +00:00 (Migrated from gitlab.com)

assigned to @serkixenos and @rilesdun

assigned to @serkixenos and @rilesdun
serkixenos commented 2022-02-02 22:18:24 +00:00 (Migrated from gitlab.com)

@bgerze

"timestamp":"2022-02- 01T19:24:10"

"expiration":"2022-02-01T10:40:27"

Looks like your transaction expired long time ago... Is this the case?

However, this is not the reason for node not recovering after restart, and not listening on appropriate port.

Share your config files and output of "netstat -altn" from your computer when software is running

@bgerze ``` "timestamp":"2022-02- 01T19:24:10" "expiration":"2022-02-01T10:40:27" ``` Looks like your transaction expired long time ago... Is this the case? However, this is not the reason for node not recovering after restart, and not listening on appropriate port. Share your config files and output of "netstat -altn" from your computer when software is running
bgerze commented 2022-02-03 08:36:10 +00:00 (Migrated from gitlab.com)

Hello,
Here are details:

LOG FILE:

MY_LOG_err.txt

genesis.json

Hello, Here are details: LOG FILE: [MY_LOG_err.txt](/uploads/5b99967f1da06a266587350f736e7074/MY_LOG_err.txt) [genesis.json](/uploads/72a7e60662adbee7819eb022f57e6f66/genesis.json)
bgerze commented 2022-02-03 08:37:14 +00:00 (Migrated from gitlab.com)

vampie@vampie-BHYVE101:~$ netstat -tulnp

Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 127.0.0.53:53 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN -
tcp 0 0 127.0.0.1:631 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:5432 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:3000 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:28090 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:28091 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:11111 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:5000 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:8332 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:18444 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:80 0.0.0.0:* LISTEN -
tcp6 0 0 :::22 :::* LISTEN -
tcp6 0 0 ::1:631 :::* LISTEN -
tcp6 0 0 :::5432 :::* LISTEN -
tcp6 0 0 :::3000 :::* LISTEN -
tcp6 0 0 :::28090 :::* LISTEN -
tcp6 0 0 :::28091 :::* LISTEN -
tcp6 0 0 :::11111 :::* LISTEN -
tcp6 0 0 :::5000 :::* LISTEN -
tcp6 0 0 :::8332 :::* LISTEN -
tcp6 0 0 :::18444 :::* LISTEN -
tcp6 0 0 :::80 :::* LISTEN -
udp 0 0 10.11.12.1:45690 0.0.0.0:* 59637/chrome --enab
udp 0 0 0.0.0.0:57906 0.0.0.0:* -
udp 0 0 127.0.0.53:53 0.0.0.0:* -
udp 0 0 10.11.12.1:123 0.0.0.0:* -
udp 0 0 157.90.6.22:123 0.0.0.0:* -
udp 0 0 127.0.0.1:123 0.0.0.0:* -
udp 0 0 0.0.0.0:123 0.0.0.0:* -
udp 0 0 0.0.0.0:631 0.0.0.0:* -
udp 0 0 157.90.6.22:34845 0.0.0.0:* 59637/chrome --enab
udp 0 0 224.0.0.251:5353 0.0.0.0:* 59637/chrome --enab
udp 0 0 224.0.0.251:5353 0.0.0.0:* 59637/chrome --enab
udp 0 0 0.0.0.0:5353 0.0.0.0:* -
udp6 0 0 fe80::8400:96ff:fe5:123 :::* -
udp6 0 0 fe80::4c49:57ff:fe4:123 :::* -
udp6 0 0 fe80::b002:5eff:fe4:123 :::* -
udp6 0 0 fe80::d827:eaff:fe2:123 :::* -
udp6 0 0 fe80::42:f8ff:fe23::123 :::* -
udp6 0 0 fe80::b42:df26:1e11:123 :::* -
udp6 0 0 ::1:123 :::* -
udp6 0 0 :::123 :::* -
udp6 0 0 :::37873 :::* -
udp6 0 0 :::5353 :::* -

vampie@vampie-BHYVE101:~$ netstat -tulnp Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp 0 0 127.0.0.53:53 0.0.0.0:* LISTEN - tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN - tcp 0 0 127.0.0.1:631 0.0.0.0:* LISTEN - tcp 0 0 0.0.0.0:5432 0.0.0.0:* LISTEN - tcp 0 0 0.0.0.0:3000 0.0.0.0:* LISTEN - tcp 0 0 0.0.0.0:28090 0.0.0.0:* LISTEN - tcp 0 0 0.0.0.0:28091 0.0.0.0:* LISTEN - tcp 0 0 0.0.0.0:11111 0.0.0.0:* LISTEN - tcp 0 0 0.0.0.0:5000 0.0.0.0:* LISTEN - tcp 0 0 0.0.0.0:8332 0.0.0.0:* LISTEN - tcp 0 0 0.0.0.0:18444 0.0.0.0:* LISTEN - tcp 0 0 0.0.0.0:80 0.0.0.0:* LISTEN - tcp6 0 0 :::22 :::* LISTEN - tcp6 0 0 ::1:631 :::* LISTEN - tcp6 0 0 :::5432 :::* LISTEN - tcp6 0 0 :::3000 :::* LISTEN - tcp6 0 0 :::28090 :::* LISTEN - tcp6 0 0 :::28091 :::* LISTEN - tcp6 0 0 :::11111 :::* LISTEN - tcp6 0 0 :::5000 :::* LISTEN - tcp6 0 0 :::8332 :::* LISTEN - tcp6 0 0 :::18444 :::* LISTEN - tcp6 0 0 :::80 :::* LISTEN - udp 0 0 10.11.12.1:45690 0.0.0.0:* 59637/chrome --enab udp 0 0 0.0.0.0:57906 0.0.0.0:* - udp 0 0 127.0.0.53:53 0.0.0.0:* - udp 0 0 10.11.12.1:123 0.0.0.0:* - udp 0 0 157.90.6.22:123 0.0.0.0:* - udp 0 0 127.0.0.1:123 0.0.0.0:* - udp 0 0 0.0.0.0:123 0.0.0.0:* - udp 0 0 0.0.0.0:631 0.0.0.0:* - udp 0 0 157.90.6.22:34845 0.0.0.0:* 59637/chrome --enab udp 0 0 224.0.0.251:5353 0.0.0.0:* 59637/chrome --enab udp 0 0 224.0.0.251:5353 0.0.0.0:* 59637/chrome --enab udp 0 0 0.0.0.0:5353 0.0.0.0:* - udp6 0 0 fe80::8400:96ff:fe5:123 :::* - udp6 0 0 fe80::4c49:57ff:fe4:123 :::* - udp6 0 0 fe80::b002:5eff:fe4:123 :::* - udp6 0 0 fe80::d827:eaff:fe2:123 :::* - udp6 0 0 fe80::42:f8ff:fe23::123 :::* - udp6 0 0 fe80::b42:df26:1e11:123 :::* - udp6 0 0 ::1:123 :::* - udp6 0 0 :::123 :::* - udp6 0 0 :::37873 :::* - udp6 0 0 :::5353 :::* -
serkixenos commented 2022-02-23 00:49:07 +00:00 (Migrated from gitlab.com)

@bgerze is this still an issue?

@bgerze is this still an issue?
serkixenos commented 2022-02-23 00:50:25 +00:00 (Migrated from gitlab.com)

If you restarted witness_node with blockchain replay, the listening will not start until the replay is done. Was this the case?

If you restarted witness_node with blockchain replay, the listening will not start until the replay is done. Was this the case?
bgerze commented 2022-02-23 12:29:54 +00:00 (Migrated from gitlab.com)

We cleared DB and restarted it with help of @rilesdun, case closed, but I strongly advise that SON plugin for graphene should be tested carefully, it may cause DB corruption and/or some inconsistency and resulting port 8090 not listening.

We cleared DB and restarted it with help of @rilesdun, case closed, but I strongly advise that SON plugin for graphene should be tested carefully, it may cause DB corruption and/or some inconsistency and resulting port 8090 not listening.
bgerze commented 2022-02-23 12:31:49 +00:00 (Migrated from gitlab.com)

No this was not the case.. after hive testing with SON, it suddenly for some transactions core dump and refused to start. Our file system was ZFS and 56 GB ram with 30 GB + free plus XEON cpu with ECC RAM. This case did not come up again.

No this was not the case.. after hive testing with SON, it suddenly for some transactions core dump and refused to start. Our file system was ZFS and 56 GB ram with 30 GB + free plus XEON cpu with ECC RAM. This case did not come up again.
serkixenos (Migrated from gitlab.com) closed this issue 2022-02-23 13:39:57 +00:00
Sign in to join this conversation.
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: Peerplays_Blockchain/peerplays_migrated#268
No description provided.