Report - Previously Unclassified Work Unit Errors

Message boards : RALPH@home bug list : Report - Previously Unclassified Work Unit Errors

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · Next

AuthorMessage
hugothehermit

Send message
Joined: 17 Feb 06
Posts: 17
Credit: 2,170
RAC: 0
Message 861 - Posted: 12 Mar 2006, 9:44:45 UTC

well if that was the case would it not have errored out straight after the benchmarks??

But seen as though BOINC left it in memory at the benchmark that can't be the reason for the failure can it?


Yep it can, 5.2.13 fixed the appliations being turfed out of memory no matter what you're options were when it did a benchmark.

Is this a definite case no. It just looks probable as BOINC could benchmark then ask for work then find out the app is stuffed.
ID: 861 · Report as offensive    Reply Quote
Profile Carlos_Pfitzner
Avatar

Send message
Joined: 16 Feb 06
Posts: 182
Credit: 22,792
RAC: 0
Message 873 - Posted: 14 Mar 2006, 1:59:53 UTC

stuck at 10.32%
https://ralph.bakerlab.org/result.php?resultid=16410
Rosetta_beta_4.84 Linux

load average: 0.00, 0.00, 0.18 (whole system)

*re-staring boinc
Click signature for global team stats
ID: 873 · Report as offensive    Reply Quote
Profile Carlos_Pfitzner
Avatar

Send message
Joined: 16 Feb 06
Posts: 182
Credit: 22,792
RAC: 0
Message 874 - Posted: 14 Mar 2006, 15:56:19 UTC

Exit status 131 (0x83)
https://ralph.bakerlab.org/result.php?resultid=16558
Rosetta_beta 4.84 Linux
Click signature for global team stats
ID: 874 · Report as offensive    Reply Quote
Mike Gelvin
Avatar

Send message
Joined: 17 Feb 06
Posts: 50
Credit: 55,397
RAC: 0
Message 877 - Posted: 15 Mar 2006, 6:46:39 UTC - in response to Message 859.  


Mike,

Correct me if I am wrong, but I thought I saw a post from you before indicating that you were running a BOINC version later than 5.2.13. If so you are correct that the error makes no sense. If you are running 5.2.13, then the Work Unit was removed from memory when the benchmark ran and that is why it errored out.



I am running 5.2.15


ID: 877 · Report as offensive    Reply Quote
Profile Carlos_Pfitzner
Avatar

Send message
Joined: 16 Feb 06
Posts: 182
Credit: 22,792
RAC: 0
Message 886 - Posted: 16 Mar 2006, 21:41:49 UTC

SIGSEGV: segmentation violationStack trace (11 frames):
https://ralph.bakerlab.org/result.php?resultid=18330
Rosetta_beta 4.84 Linux

*app swapping has not occurred, nor benchmarking -:(

Date Host Project ID Message
3/16/2006 5:49:39 PM crobertp.cp3 ralph@home 1098 Finished download of cc1opd_03_05.200_v1_3.gz
3/16/2006 5:49:39 PM crobertp.cp3 ralph@home 1099 Throughput 31053 bytes/sec
3/16/2006 5:50:14 PM crobertp.cp3 ralph@home 1100 Finished download of cc1opd_09_05.200_v1_3.gz
3/16/2006 5:50:14 PM crobertp.cp3 ralph@home 1101 Throughput 32962 bytes/sec
3/16/2006 5:50:15 PM crobertp.cp3 --- 1102 request_reschedule_cpus: files downloaded
3/16/2006 6:25:20 PM crobertp.cp3 ralph@home 1103 Unrecoverable error for result HB_BARCODE_30_1b3aA_351_30_0 (process exited with code 131 (0x83))
3/16/2006 6:25:20 PM crobertp.cp3 --- 1104 request_reschedule_cpus: process exited
3/16/2006 6:25:20 PM crobertp.cp3 ralph@home 1105 Computation for result HB_BARCODE_30_1b3aA_351_30_0 finished
3/16/2006 6:25:20 PM crobertp.cp3 QMC@HOME 1106 Restarting result one_pwcdna_nodelete.1998_0 using Amolqc-alpha version 505
3/16/2006 6:26:21 PM crobertp.cp3 ralph@home 1107 Sending scheduler request to https://ralph.bakerlab.org/ralph_cgi/cgi
3/16/2006 6:26:21 PM crobertp.cp3 ralph@home 1108 Reason: To report results



Click signature for global team stats
ID: 886 · Report as offensive    Reply Quote
Profile Carlos_Pfitzner
Avatar

Send message
Joined: 16 Feb 06
Posts: 182
Credit: 22,792
RAC: 0
Message 897 - Posted: 17 Mar 2006, 19:27:18 UTC
Last modified: 17 Mar 2006, 19:32:22 UTC

SIGSEGV: segmentation violationStack trace (11 frames):
https://ralph.bakerlab.org/result.php?resultid=18472
SIGSEGV: segmentation violationStack trace (11 frames):
https://ralph.bakerlab.org/result.php?resultid=18966
Exit status 0 (0x0)
https://ralph.bakerlab.org/result.php?resultid=19471
Rosetta_beta 4.84 Linux for all results above
Click signature for global team stats
ID: 897 · Report as offensive    Reply Quote
hugothehermit

Send message
Joined: 17 Feb 06
Posts: 17
Credit: 2,170
RAC: 0
Message 902 - Posted: 18 Mar 2006, 6:25:48 UTC

This WU hasn't been doing anything (it's not stuck on 1% it's stuck on 0%) for, I would guess about 9 hours, I can't find when it started in the messages as I had a power outage.

stderr.txt
# random seed: 3985987
No heartbeat from core client for 31 sec - exiting


I would guess it never got around to properly exiting, as the other (HT) CPU is working away no worries. It's probably just an error my end.







ID: 902 · Report as offensive    Reply Quote
hugothehermit

Send message
Joined: 17 Feb 06
Posts: 17
Credit: 2,170
RAC: 0
Message 903 - Posted: 18 Mar 2006, 6:33:18 UTC

I just did a reboot and the WU is now working.
ID: 903 · Report as offensive    Reply Quote
Profile Carlos_Pfitzner
Avatar

Send message
Joined: 16 Feb 06
Posts: 182
Credit: 22,792
RAC: 0
Message 910 - Posted: 19 Mar 2006, 1:35:57 UTC

*** glibc detected *** corrupted double-linked list: 0x0894a300 ***
https://ralph.bakerlab.org/result.php?resultid=21152
Rosetta_beta 4.84 Linux

Click signature for global team stats
ID: 910 · Report as offensive    Reply Quote
Profile Carlos_Pfitzner
Avatar

Send message
Joined: 16 Feb 06
Posts: 182
Credit: 22,792
RAC: 0
Message 911 - Posted: 19 Mar 2006, 6:17:31 UTC

SIGSEGV: segmentation violationStack trace (11 frames):[b]
https://ralph.bakerlab.org/result.php?resultid=19920
https://ralph.bakerlab.org/result.php?resultid=20503
[b]Rosetta_beta 4.84 Linux


Click signature for global team stats
ID: 911 · Report as offensive    Reply Quote
Profile Carlos_Pfitzner
Avatar

Send message
Joined: 16 Feb 06
Posts: 182
Credit: 22,792
RAC: 0
Message 912 - Posted: 19 Mar 2006, 6:17:49 UTC

SIGSEGV: segmentation violationStack trace (11 frames):
https://ralph.bakerlab.org/result.php?resultid=19920
https://ralph.bakerlab.org/result.php?resultid=20503
Rosetta_beta 4.84 Linux

Click signature for global team stats
ID: 912 · Report as offensive    Reply Quote
Profile Carlos_Pfitzner
Avatar

Send message
Joined: 16 Feb 06
Posts: 182
Credit: 22,792
RAC: 0
Message 940 - Posted: 22 Mar 2006, 2:34:34 UTC
Last modified: 22 Mar 2006, 2:37:53 UTC


Click signature for global team stats
ID: 940 · Report as offensive    Reply Quote
Profile Carlos_Pfitzner
Avatar

Send message
Joined: 16 Feb 06
Posts: 182
Credit: 22,792
RAC: 0
Message 941 - Posted: 22 Mar 2006, 2:35:03 UTC

Exit status 2 (0x2)
https://ralph.bakerlab.org/result.php?resultid=49678
Rosetta_beta 4.85 Linux
Click signature for global team stats
ID: 941 · Report as offensive    Reply Quote
Profile Carlos_Pfitzner
Avatar

Send message
Joined: 16 Feb 06
Posts: 182
Credit: 22,792
RAC: 0
Message 948 - Posted: 22 Mar 2006, 7:34:19 UTC
Last modified: 22 Mar 2006, 7:38:40 UTC

stuck at 78.47%
https://ralph.bakerlab.org/result.php?resultid=49653
Rosetta_beta 4.85 Linux

load average: 0.00, 0.00, 0.17

*re-starting boinc, following message apears on Linux console
*** glibc detected *** double free or corruption (fasttop): 0x0914b110 ***

Click signature for global team stats
ID: 948 · Report as offensive    Reply Quote
Profile Carlos_Pfitzner
Avatar

Send message
Joined: 16 Feb 06
Posts: 182
Credit: 22,792
RAC: 0
Message 950 - Posted: 22 Mar 2006, 12:40:05 UTC
Last modified: 22 Mar 2006, 12:42:27 UTC

Exit status 131 (0x83)
*** glibc detected *** corrupted double-linked list: 0x0986c7e0 ***
SIGSEGV: segmentation violationStack trace (12 frames):

https://ralph.bakerlab.org/result.php?resultid=49639
Rosetta_beta 4.85 Linux

Click signature for global team stats
ID: 950 · Report as offensive    Reply Quote
Profile Carlos_Pfitzner
Avatar

Send message
Joined: 16 Feb 06
Posts: 182
Credit: 22,792
RAC: 0
Message 952 - Posted: 22 Mar 2006, 15:30:10 UTC
Last modified: 22 Mar 2006, 15:32:14 UTC

Exit status 139 (0x8b)
process got signal 11
SIGSEGV: segmentation violationStack trace (10 frames):

https://ralph.bakerlab.org/result.php?resultid=49708
Rosetta_beta 4.85 Linux


Click signature for global team stats
ID: 952 · Report as offensive    Reply Quote
Snake Doctor

Send message
Joined: 16 Feb 06
Posts: 37
Credit: 998,880
RAC: 0
Message 957 - Posted: 22 Mar 2006, 22:54:28 UTC

Every ralph WU that has hit my system since the release of Mac version 4.86 has crashed. Up till now I had only seen one WU fail. The errors say that BOINC libray 5.2.27 was used to compile the application. Like this one here


I am running noinc 5.1.13 which is the current release version. Also some of the errors are for a missing file. Such as this one here

What ever is changed for version 4.86 on the Mac is clearly not working.

Regards
Phil

ID: 957 · Report as offensive    Reply Quote
Profile Carlos_Pfitzner
Avatar

Send message
Joined: 16 Feb 06
Posts: 182
Credit: 22,792
RAC: 0
Message 1006 - Posted: 28 Mar 2006, 11:08:38 UTC

Rosetta_beta 4.87 Linux
Exit status 131 (0x83) SIGSEGV
https://ralph.bakerlab.org/result.php?resultid=68531
https://ralph.bakerlab.org/result.php?resultid=68618


Click signature for global team stats
ID: 1006 · Report as offensive    Reply Quote
casio7131

Send message
Joined: 20 Mar 06
Posts: 15
Credit: 12,660
RAC: 0
Message 1044 - Posted: 8 Apr 2006, 4:44:50 UTC
Last modified: 8 Apr 2006, 4:48:32 UTC

8/04/2006 2:22:05 PM|ralph@home|Unrecoverable error for result HBLR_1.0_2tif_375_18_0 ( - exit code -1073741819 (0xc0000005))
resultid=79533
died after about 40 min

8/04/2006 2:33:01 PM|ralph@home|Unrecoverable error for result HBLR_1.0_1b72_375_87_0 ( - exit code -1073741819 (0xc0000005))
resultid=80087
died after about 11 min

note, i've had similar problems with these HB work units in rosetta too:
(see https://boinc.bakerlab.org/rosetta/forum_thread.php?id=1106#13206)

8/04/2006 1:44:51 PM|rosetta@home|Unrecoverable error for result HBLR_1.0_1hz6_420_4766_0 ( - exit code -1073741811 (0xc000000d))
resultid=16362541
3/04/2006 11:38:45 PM|rosetta@home|Unrecoverable error for result HB_BARCODE_30_4ubpA_351_49332_0 ( - exit code -1073741811 (0xc000000d))
resultid=15780509

ID: 1044 · Report as offensive    Reply Quote
casio7131

Send message
Joined: 20 Mar 06
Posts: 15
Credit: 12,660
RAC: 0
Message 1045 - Posted: 8 Apr 2006, 5:25:59 UTC
Last modified: 8 Apr 2006, 5:26:33 UTC

and now another HB failure.

8/04/2006 3:38:24 PM|ralph@home|Unrecoverable error for result HBLR_1.0_1di2_375_67_0 ( - exit code -1073741819 (0xc0000005))

https://ralph.bakerlab.org/result.php?resultid=79922
ID: 1045 · Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · Next

Message boards : RALPH@home bug list : Report - Previously Unclassified Work Unit Errors



©2024 University of Washington
http://www.bakerlab.org