FC library test fc_thread/leave_mutex_locked randomly fails #330

Closed
opened 2022-03-23 22:33:16 +00:00 by serkixenos · 7 comments
serkixenos commented 2022-03-23 22:33:16 +00:00 (Migrated from gitlab.com)

https://gitlab.com/PBSA/peerplays/-/jobs/2242637428

unknown location(0): fatal error: in "fc_thread/leave_mutex_locked": signal: SIGSEGV, si_code: 0 (memory access violation at address: 0x00000080)
/home/gitlab-runner/builds/JN7Eq9yZ/0/PBSA/peerplays/libraries/fc/tests/thread/task_cancel.cpp(12): last checkpoint: "leave_mutex_locked" test entry
*** 1 failure is detected in the test module "AllTests"
Cleaning up project directory and file based variables
00:00
ERROR: Job failed: exit status 1
https://gitlab.com/PBSA/peerplays/-/jobs/2242637428 ``` unknown location(0): fatal error: in "fc_thread/leave_mutex_locked": signal: SIGSEGV, si_code: 0 (memory access violation at address: 0x00000080) /home/gitlab-runner/builds/JN7Eq9yZ/0/PBSA/peerplays/libraries/fc/tests/thread/task_cancel.cpp(12): last checkpoint: "leave_mutex_locked" test entry *** 1 failure is detected in the test module "AllTests" Cleaning up project directory and file based variables 00:00 ERROR: Job failed: exit status 1 ```
serkixenos commented 2022-03-23 22:35:54 +00:00 (Migrated from gitlab.com)

assigned to @pavel.baykov

assigned to @pavel.baykov
pavel.baykov commented 2022-03-25 12:34:30 +00:00 (Migrated from gitlab.com)

executed many times with no issues, however valgrind shows potential error

bunker@pavel-tests:~/issue330/peerplays/build/libraries/fc/tests$ valgrind --leak-check=full ./task_cancel_test 
==2839377== Memcheck, a memory error detector
==2839377== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==2839377== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==2839377== Command: ./task_cancel_test
==2839377== 
Running 7 test cases...
==2839377== Warning: client switching stacks?  SP change: 0x1ffeffeba0 --> 0x503adc0
==2839377==          to suppress, use: --max-stackframe=137338043872 or greater
==2839377== Warning: client switching stacks?  SP change: 0x503a750 --> 0x53bf000
==2839377==          to suppress, use: --max-stackframe=3688624 or greater
==2839377== Warning: client switching stacks?  SP change: 0x53bed00 --> 0x503a750
==2839377==          to suppress, use: --max-stackframe=3687856 or greater
==2839377==          further instances of this message will not be shown.
626697ms th_a       task_cancel.cpp:245           test_method          ] 9 canceled_exception: Canceled
task simple_task canceled before starting, reason canceling scheduled task to test if cancel works
    {"description":"simple_task","reason":"canceling scheduled task to test if cancel works"}
    th_a  task.cpp:48 run_impl

*** No errors detected
==2839377== 
==2839377== HEAP SUMMARY:
==2839377==     in use at exit: 4,195,791 bytes in 22 blocks
==2839377==   total heap usage: 3,533 allocs, 3,511 frees, 4,511,663 bytes allocated
==2839377== 
==2839377== 2,097,152 bytes in 1 blocks are possibly lost in loss record 21 of 22
==2839377==    at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==2839377==    by 0x32B53F: boost::coroutines::basic_standard_stack_allocator<boost::coroutines::stack_traits>::allocate(boost::coroutines::stack_context&, unsigned long) (standard_stack_allocator.hpp:42)
==2839377==    by 0x325C56: fc::context::context(void (*)(boost::context::detail::transfer_t), boost::coroutines::basic_standard_stack_allocator<boost::coroutines::stack_traits>&, fc::thread*) (context.hpp:86)
==2839377==    by 0x32844E: fc::thread_d::start_next_fiber(bool) (thread_d.hpp:442)
==2839377==    by 0x329C42: fc::thread_d::yield_until(fc::time_point const&, bool) (thread_d.hpp:702)
==2839377==    by 0x31F044: fc::thread::sleep_until(fc::time_point const&) (thread.cpp:276)
==2839377==    by 0x320015: fc::usleep(fc::microseconds const&) (thread.cpp:360)
==2839377==    by 0x2D9260: fc_thread::leave_mutex_locked::test_method() (task_cancel.cpp:21)
==2839377==    by 0x2D8EA2: fc_thread::leave_mutex_locked_invoker() (task_cancel.cpp:12)
==2839377==    by 0x2E4A2F: boost::detail::function::void_function_invoker0<void (*)(), void>::invoke(boost::detail::function::function_buffer&) (function_template.hpp:117)
==2839377==    by 0x47EAD1: boost::detail::function::function_obj_invoker0<boost::detail::forward, int>::invoke(boost::detail::function::function_buffer&) (in /home/bunker/issue330/peerplays/build/libraries/fc/tests/task_cancel_test)
==2839377==    by 0x47E824: boost::execution_monitor::catch_signals(boost::function<int ()> const&) (in /home/bunker/issue330/peerplays/build/libraries/fc/tests/task_cancel_test)
==2839377== 
==2839377== 2,097,152 bytes in 1 blocks are possibly lost in loss record 22 of 22
==2839377==    at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==2839377==    by 0x32B53F: boost::coroutines::basic_standard_stack_allocator<boost::coroutines::stack_traits>::allocate(boost::coroutines::stack_context&, unsigned long) (standard_stack_allocator.hpp:42)
==2839377==    by 0x325C56: fc::context::context(void (*)(boost::context::detail::transfer_t), boost::coroutines::basic_standard_stack_allocator<boost::coroutines::stack_traits>&, fc::thread*) (context.hpp:86)
==2839377==    by 0x32844E: fc::thread_d::start_next_fiber(bool) (thread_d.hpp:442)
==2839377==    by 0x329C42: fc::thread_d::yield_until(fc::time_point const&, bool) (thread_d.hpp:702)
==2839377==    by 0x31F044: fc::thread::sleep_until(fc::time_point const&) (thread.cpp:276)
==2839377==    by 0x320015: fc::usleep(fc::microseconds const&) (thread.cpp:360)
==2839377==    by 0x2D9194: fc_thread::leave_mutex_locked::test_method()::{lambda()#1}::operator()() const (task_cancel.cpp:19)
==2839377==    by 0x2E004F: fc::detail::void_functor_run<fc_thread::leave_mutex_locked::test_method()::{lambda()#1}>::run(void*, fc::detail::void_functor_run<fc_thread::leave_mutex_locked::test_method()::{lambda()#1}>) (task.hpp:83)
==2839377==    by 0x338D9D: fc::task_base::run_impl() (task.cpp:43)
==2839377==    by 0x338D19: fc::task_base::run() (task.cpp:32)
==2839377==    by 0x328FC3: fc::thread_d::run_next_task() (thread_d.hpp:520)
==2839377== 
==2839377== LEAK SUMMARY:
==2839377==    definitely lost: 0 bytes in 0 blocks
==2839377==    indirectly lost: 0 bytes in 0 blocks
==2839377==      possibly lost: 4,194,304 bytes in 2 blocks
==2839377==    still reachable: 1,487 bytes in 20 blocks
==2839377==         suppressed: 0 bytes in 0 blocks
==2839377== Reachable blocks (those to which a pointer was found) are not shown.
==2839377== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==2839377== 
==2839377== For lists of detected and suppressed errors, rerun with: -s
==2839377== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0)
executed many times with no issues, however valgrind shows potential error ``` bunker@pavel-tests:~/issue330/peerplays/build/libraries/fc/tests$ valgrind --leak-check=full ./task_cancel_test ==2839377== Memcheck, a memory error detector ==2839377== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. ==2839377== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info ==2839377== Command: ./task_cancel_test ==2839377== Running 7 test cases... ==2839377== Warning: client switching stacks? SP change: 0x1ffeffeba0 --> 0x503adc0 ==2839377== to suppress, use: --max-stackframe=137338043872 or greater ==2839377== Warning: client switching stacks? SP change: 0x503a750 --> 0x53bf000 ==2839377== to suppress, use: --max-stackframe=3688624 or greater ==2839377== Warning: client switching stacks? SP change: 0x53bed00 --> 0x503a750 ==2839377== to suppress, use: --max-stackframe=3687856 or greater ==2839377== further instances of this message will not be shown. 626697ms th_a task_cancel.cpp:245 test_method ] 9 canceled_exception: Canceled task simple_task canceled before starting, reason canceling scheduled task to test if cancel works {"description":"simple_task","reason":"canceling scheduled task to test if cancel works"} th_a task.cpp:48 run_impl *** No errors detected ==2839377== ==2839377== HEAP SUMMARY: ==2839377== in use at exit: 4,195,791 bytes in 22 blocks ==2839377== total heap usage: 3,533 allocs, 3,511 frees, 4,511,663 bytes allocated ==2839377== ==2839377== 2,097,152 bytes in 1 blocks are possibly lost in loss record 21 of 22 ==2839377== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==2839377== by 0x32B53F: boost::coroutines::basic_standard_stack_allocator<boost::coroutines::stack_traits>::allocate(boost::coroutines::stack_context&, unsigned long) (standard_stack_allocator.hpp:42) ==2839377== by 0x325C56: fc::context::context(void (*)(boost::context::detail::transfer_t), boost::coroutines::basic_standard_stack_allocator<boost::coroutines::stack_traits>&, fc::thread*) (context.hpp:86) ==2839377== by 0x32844E: fc::thread_d::start_next_fiber(bool) (thread_d.hpp:442) ==2839377== by 0x329C42: fc::thread_d::yield_until(fc::time_point const&, bool) (thread_d.hpp:702) ==2839377== by 0x31F044: fc::thread::sleep_until(fc::time_point const&) (thread.cpp:276) ==2839377== by 0x320015: fc::usleep(fc::microseconds const&) (thread.cpp:360) ==2839377== by 0x2D9260: fc_thread::leave_mutex_locked::test_method() (task_cancel.cpp:21) ==2839377== by 0x2D8EA2: fc_thread::leave_mutex_locked_invoker() (task_cancel.cpp:12) ==2839377== by 0x2E4A2F: boost::detail::function::void_function_invoker0<void (*)(), void>::invoke(boost::detail::function::function_buffer&) (function_template.hpp:117) ==2839377== by 0x47EAD1: boost::detail::function::function_obj_invoker0<boost::detail::forward, int>::invoke(boost::detail::function::function_buffer&) (in /home/bunker/issue330/peerplays/build/libraries/fc/tests/task_cancel_test) ==2839377== by 0x47E824: boost::execution_monitor::catch_signals(boost::function<int ()> const&) (in /home/bunker/issue330/peerplays/build/libraries/fc/tests/task_cancel_test) ==2839377== ==2839377== 2,097,152 bytes in 1 blocks are possibly lost in loss record 22 of 22 ==2839377== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==2839377== by 0x32B53F: boost::coroutines::basic_standard_stack_allocator<boost::coroutines::stack_traits>::allocate(boost::coroutines::stack_context&, unsigned long) (standard_stack_allocator.hpp:42) ==2839377== by 0x325C56: fc::context::context(void (*)(boost::context::detail::transfer_t), boost::coroutines::basic_standard_stack_allocator<boost::coroutines::stack_traits>&, fc::thread*) (context.hpp:86) ==2839377== by 0x32844E: fc::thread_d::start_next_fiber(bool) (thread_d.hpp:442) ==2839377== by 0x329C42: fc::thread_d::yield_until(fc::time_point const&, bool) (thread_d.hpp:702) ==2839377== by 0x31F044: fc::thread::sleep_until(fc::time_point const&) (thread.cpp:276) ==2839377== by 0x320015: fc::usleep(fc::microseconds const&) (thread.cpp:360) ==2839377== by 0x2D9194: fc_thread::leave_mutex_locked::test_method()::{lambda()#1}::operator()() const (task_cancel.cpp:19) ==2839377== by 0x2E004F: fc::detail::void_functor_run<fc_thread::leave_mutex_locked::test_method()::{lambda()#1}>::run(void*, fc::detail::void_functor_run<fc_thread::leave_mutex_locked::test_method()::{lambda()#1}>) (task.hpp:83) ==2839377== by 0x338D9D: fc::task_base::run_impl() (task.cpp:43) ==2839377== by 0x338D19: fc::task_base::run() (task.cpp:32) ==2839377== by 0x328FC3: fc::thread_d::run_next_task() (thread_d.hpp:520) ==2839377== ==2839377== LEAK SUMMARY: ==2839377== definitely lost: 0 bytes in 0 blocks ==2839377== indirectly lost: 0 bytes in 0 blocks ==2839377== possibly lost: 4,194,304 bytes in 2 blocks ==2839377== still reachable: 1,487 bytes in 20 blocks ==2839377== suppressed: 0 bytes in 0 blocks ==2839377== Reachable blocks (those to which a pointer was found) are not shown. ==2839377== To see them, rerun with: --leak-check=full --show-leak-kinds=all ==2839377== ==2839377== For lists of detected and suppressed errors, rerun with: -s ==2839377== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0) ```
pavel.baykov commented 2022-03-25 12:38:56 +00:00 (Migrated from gitlab.com)

based on valgrind stack trace i think it’s required to check start_next_fiber with cancel state

based on valgrind stack trace i think it’s required to check start_next_fiber with cancel state
pavel.baykov commented 2022-04-01 12:02:58 +00:00 (Migrated from gitlab.com)

one of subproblem was discussed in
https://github.com/cryptonomex/graphene/issues/597

one of subproblem was discussed in https://github.com/cryptonomex/graphene/issues/597
pavel.baykov commented 2022-04-01 12:08:09 +00:00 (Migrated from gitlab.com)

during failure shutdown sometimes it crashed in logging, one potential solution is to use external process logger and pass messages via shared memory

during failure shutdown sometimes it crashed in logging, one potential solution is to use external process logger and pass messages via shared memory
pavel.baykov commented 2022-04-01 12:15:21 +00:00 (Migrated from gitlab.com)

from blowfish_chain_test there are memory errors observed in sanitezers / valgrind

from blowfish_chain_test there are memory errors observed in sanitezers / valgrind
serkixenos commented 2022-04-05 22:47:18 +00:00 (Migrated from gitlab.com)

For test, we will observe CI/CD logs for errors.

For test, we will observe CI/CD logs for errors.
serkixenos (Migrated from gitlab.com) closed this issue 2022-04-05 22:47:22 +00:00
Sign in to join this conversation.
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: Peerplays_Blockchain/peerplays_migrated#330
No description provided.