Message boards : RALPH@home bug list : Report - Previously Unclassified Work Unit Errors
Previous · 1 · 2 · 3 · 4 · 5 · Next
Author | Message |
---|---|
Carlos_Pfitzner Send message Joined: 16 Feb 06 Posts: 182 Credit: 22,792 RAC: 0 |
Ananas get error exit 1 (0x1) into bbc climate prediction app too He relates this error in that case, to a missing .dll on c:winntsystem32 mscoree.dll Seems then, that windows app need be static too I never see that dll on my windows 3.11 -or- 95/98 .hehe read here the full thread, if u want http://boinc.bio.wzw.tum.de/boincsimap/forum/viewtopic.php?t=248 |
Robert Everly Send message Joined: 16 Feb 06 Posts: 10 Credit: 2,333 RAC: 0 |
Haven't seen this error before. I checked in on how things were going, saw a 4.91, gave the graphics a shot. It ran for a couple of minutes, then went all wacky. The accepted protein model disappared as did both graphs. It advanced a couple of steps and hard locked the computer. Also got a bunch of runtime error popup boxes. No screenshots though with the lockup. Had to do a cold reboot. Anyway, here is the wu. https://ralph.bakerlab.org/result.php?resultid=13679 and the error result. <core_client_version>5.2.12</core_client_version> |
doc :) Send message Joined: 16 Feb 06 Posts: 46 Credit: 4,437 RAC: 0 |
got this one with 4.91: 03/03/2006 02:06:15|ralph@home|Unrecoverable error for result BARCODE_30_1iibA_227_10_1 ( - exit code -1073741811 (0xc000000d)) grahpics were open in a window (for more than a hour or so) then it simply crashed, thats the same error i am getting with rosetta 4.82 when i got graphics open, all seems to work fine when i do not open graphics. WU - result |
David@home Send message Joined: 16 Feb 06 Posts: 24 Credit: 409 RAC: 0 |
This WU finished using v 4.92 and claimed credit but contained some interesting messages so worth a look by the experts: # Exception caught in nstruct loop ii=1 i=40 # num_decoys:39 attempts:40 cpu_run_time:26311.8 ***UNHANDLED EXCEPTION**** Reason: Access Violation (0xc0000005) at address 0x7C910E03 write attempt to address 0x00000000 # cpu_run_time_pref: 28800 WU result is resultid 14051 |
Spare_Cycles Send message Joined: 16 Feb 06 Posts: 17 Credit: 12,942 RAC: 0 |
This WU finished using v 4.92 and claimed credit but contained some interesting messages so worth a look by the experts: Looks like the WU errored out and you would have gotten zero credit, but the new code that we're now testing kicked in and salvaged the WU. |
Carlos_Pfitzner Send message Joined: 16 Feb 06 Posts: 182 Credit: 22,792 RAC: 0 |
I crunched 6 WUs using rosetta_beta_4.92 (windows) and have NO errors However with rosetta_beta_4.84 (Linux) I have several WUs with errors ALL with the same error -> SIGSEGV https://ralph.bakerlab.org/result.php?resultid=12969 https://ralph.bakerlab.org/result.php?resultid=13093 https://ralph.bakerlab.org/result.php?resultid=13267 https://ralph.bakerlab.org/result.php?resultid=13987 https://ralph.bakerlab.org/result.php?resultid=14057 https://ralph.bakerlab.org/result.php?resultid=14534 Click signature for global team stats |
David@home Send message Joined: 16 Feb 06 Posts: 24 Credit: 409 RAC: 0 |
This WU had unrecoverable error result 13723 in BOINC log: 05/03/2006 20:37:03|ralph@home|Unrecoverable error for result BARCODE_30_1c8cA_236_4_0 ( - exit code -1073741819 (0xc0000005)) XP Pro SP2, Intel P4 single CPU no HT. BOINC 5.2.13 |
Dimitris Hatzopoulos Send message Joined: 16 Feb 06 Posts: 31 Credit: 2,308 RAC: 0 |
Carlos, I think the most probable explanation for SIGSEGV is because your Linux PC has only 256MB of RAM, whereas your WinXP PC has 512MB RAM. Rosetta needs (relatively to other apps) a lot of memory, on the WinXP PC next to me it has 2 Rosetta tasks: one with 125MBytes Working Set. The other consumes just 45MBytes. So, if your Linux PC got the former, it'd probably crash with SIGSEGV, if it got the latter, it'd probably run it fine. With 256MB RAM on a PC, it's a coin toss. I hope that eventually the BOINC/R@h system will become "smarter" so it can send smaller proteins to PCs with less RAM. Do a # free on your Linux machine before running boinc/rosetta and after and let us know. I crunched 6 WUs using rosetta_beta_4.92 (windows) and have NO errors |
Spare_Cycles Send message Joined: 16 Feb 06 Posts: 17 Credit: 12,942 RAC: 0 |
Carlos, I think the most probable explanation for SIGSEGV is because your Linux PC has only 256MB of RAM, whereas your WinXP PC has 512MB RAM. The lack of physical memory will never cause a SIGSEGV on a properly functioning modern PC. Programs run in virtual memory, and the virtual memory will look exactly the same regardless of how much physical memory there is. If there isn't enough physical memory then there will be a lot of swapping to disk, which can slow things way down. That can cause a problem if the computer is doing something like burning a CD. It will never cause an error in a crunching program like ralph/rosetta. That assumes the PC is working. If, for instance, there are errors when reading the hard disk, then pages will be corrupted when they are swapped back in. |
Carlos_Pfitzner Send message Joined: 16 Feb 06 Posts: 182 Credit: 22,792 RAC: 0 |
Carlos, I think the most probable explanation for SIGSEGV is because your Linux PC has only 256MB of RAM, whereas your WinXP PC has 512MB RAM. However I believe that the most probably cause is because the app is not linked static and is using my old libc.6.so crobertp [/home/boinc/BOINC] > ls /lib/libc* -lha -rw-r--r-- 1 root root 1.2M Oct 13 2004 /lib/libc-2.3.2.so lrwxrwxrwx 1 root root 13 Oct 18 2004 /lib/libc.so.6 -> libc-2.3.2.so lrwxrwxrwx 1 root root 14 May 3 2003 /lib/libcap.so.1 -> libcap.so.1.10 -rw-r--r-- 1 root root 9.2k Jan 31 2003 /lib/libcap.so.1.10 lrwxrwxrwx 1 root root 17 May 3 2003 /lib/libcom_err.so.2 -> libcom_err.so.2.0 -rw-r--r-- 1 root root 5.3k Jan 6 2003 /lib/libcom_err.so.2.0 -rw-r--r-- 1 root root 18k Oct 13 2004 /lib/libcrypt-2.3.2.so lrwxrwxrwx 1 root root 17 Oct 18 2004 /lib/libcrypt.so.1 -> libcrypt-2.3.2.so crobertp [/home/boinc/BOINC] > *These libs where not old when I booted my pc by middle of 2004 year However I know a couple of newer libc.so.6 was developped since then and contains newer functions that was not even imagined by 2004 *Sure, u get a sigsegv, each time u use one of these newer libc calls that does not exist on my libc, *Ofcourse u can use newer calls w/o problems, IF u app is linked static BTW: I get a Exit status 1 (0x1) running rosetta 4.82 on the same pc rosetta_beta_4.92 had run OK https://boinc.bakerlab.org/rosetta/result.php?resultid=12625437 Click signature for global team stats |
Dimitris Hatzopoulos Send message Joined: 16 Feb 06 Posts: 31 Credit: 2,308 RAC: 0 |
Carlos, on second thought, you and SpareCycles are probably correct about the 256M RAM not being the reason for SIGSEGV, but on the other hand, my version of RALPH for Linux seems to be statically linked: $ ldd rosetta_beta_4.84_i686-pc-linux-gnu not a dynamic executable $ file rosetta_beta_4.84_i686-pc-linux-gnu rosetta_beta_4.84_i686-pc-linux-gnu: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), for GNU/Linux 2.2.5, statically linked, stripped |
Moderator9 Volunteer moderator Send message Joined: 16 Feb 06 Posts: 251 Credit: 0 RAC: 0 |
Carlos, on second thought, you and SpareCycles are probably correct about the 256M RAM not being the reason for SIGSEGV, but on the other hand, my version of RALPH for Linux seems to be statically linked: The debug routines are using more memory than would be the case with Rosetta, and in fact the debug information was turned on after 4.82. So this may be the reason you are having the errors, or at least contributing to them. Moderator9 RALPH@home FAQs RALPH@home Guidelines Moderator Contact |
hugothehermit Send message Joined: 17 Feb 06 Posts: 17 Credit: 2,170 RAC: 0 |
(SIGSEGV SIGnal SEGmentation Violation) The quote is from here
Is anyone else getting this error? If not, I would check your hard disk, memory and reset the project (to re-download the app). As it would be very strange that only one computer "found" a miss-allocated pointer or an array going passed it's limit etc... . Though stranger things have happened :? Edit: to fix up spelling and a bit of formatting ... and a bit more ... and again |
Carlos_Pfitzner Send message Joined: 16 Feb 06 Posts: 182 Credit: 22,792 RAC: 0 |
(SIGSEGV SIGnal SEGmentation Violation) The quote is from here Thanks, however more than *one* pc has erroed out some of my results too *may be, not every alphatester had the patience and time to report here. Click on the results I posted, and then, on each one, click Workunit I did this only for a few ... this one for example ... https://ralph.bakerlab.org/result.php?resultid=14553 *It reports some stackwalker uninitialized *sure cause of sigsegv btw: my smartd daemon is not reporting any errors on my hda -:) Click signature for global team stats |
[B^S] sTrey Send message Joined: 15 Feb 06 Posts: 58 Credit: 15,430 RAC: 0 |
Unless it's old news, wu 11798 might be worth a look. I was surprised to see it completed for me after erroring out for two others: one on 4.91 the other with 4.92 like me; different cpus, all Windows XP variants, the errors were access violations one read one write. My run-time pref was shorter than theirs but one of the errors happened faster than it took for my wu to complete. Looks nasty to figure out, good luck. p.s. Ah, one significant difference I forgot! I had to exit and restart the client for an unrelated reason (did not log out nor reboot however) when the wu was about 90 minutes done. I remember being surprised that it didn't start over; guess Rosetta checkpoints are more implemented than I realized. |
hugothehermit Send message Joined: 17 Feb 06 Posts: 17 Credit: 2,170 RAC: 0 |
Thanks, however more than *one* pc has erroed out some of my results too You may well be right, as it seems sensible that the "swap app out of memory" would almost always find a miss-allocated / miss-used piece of memory such as a un(re)defined pointer or an array over run, where you could somethimes get away with it if the memory hasn't been changed. As it has happened in both of your OS's you could assume that it is the code not the compiler, a code line by line search is in order I think. |
Carlos_Pfitzner Send message Joined: 16 Feb 06 Posts: 182 Credit: 22,792 RAC: 0 |
app Rosetta_beta_4.84 Linux Exit status 2 (0x2) https://ralph.bakerlab.org/result.php?resultid=15867 Exit status 131 (0x83) https://ralph.bakerlab.org/result.php?resultid=15886 Click signature for global team stats |
Mike Gelvin Send message Joined: 17 Feb 06 Posts: 50 Credit: 55,397 RAC: 0 |
Rosetta Beta 4.92 under Win 2000 SP 4 https://ralph.bakerlab.org/result.php?resultid=15965 Unexplained error. I have system to leave app in memory so I don't think it's that. Text from File: 3/11/2006 12:04:54 PM|ralph@home|Pausing result 7449_fullatom_relax_evdec00_2_0001.pdb_246_1_0 (left in memory) 3/11/2006 12:04:57 PM||Running CPU benchmarks 3/11/2006 12:05:55 PM||Benchmark results: 3/11/2006 12:05:55 PM|| Number of CPUs: 1 3/11/2006 12:05:55 PM|| 1166 double precision MIPS (Whetstone) per CPU 3/11/2006 12:05:55 PM|| 2274 integer MIPS (Dhrystone) per CPU 3/11/2006 12:05:55 PM||Finished CPU benchmarks 3/11/2006 12:05:56 PM||Resuming computation and network activity 3/11/2006 12:05:56 PM||request_reschedule_cpus: Resuming activities 3/11/2006 12:05:56 PM|ralph@home|Resuming result 7449_fullatom_relax_evdec00_2_0001.pdb_246_1_0 using rosetta_beta version 492 3/11/2006 12:20:37 PM|ralph@home|Sending scheduler request to https://ralph.bakerlab.org/ralph_cgi/cgi 3/11/2006 12:20:37 PM|ralph@home|Reason: To fetch work 3/11/2006 12:20:37 PM|ralph@home|Requesting 96635 seconds of new work 3/11/2006 12:20:41 PM|ralph@home|Scheduler request to https://ralph.bakerlab.org/ralph_cgi/cgi succeeded 3/11/2006 12:20:41 PM|ralph@home|No work from project 3/11/2006 12:53:07 PM|ralph@home|Unrecoverable error for result 7449_fullatom_relax_evdec00_2_0001.pdb_246_1_0 ( - exit code -1073741811 (0xc000000d)) 3/11/2006 12:53:07 PM||request_reschedule_cpus: process exited 3/11/2006 12:53:07 PM|ralph@home|Computation for result 7449_fullatom_relax_evdec00_2_0001.pdb_246_1_0 finished 3/11/2006 12:53:07 PM|SETI@home|Starting result 11ap03ab.5070.14416.928404.1.62_0 using setiathome version 418 |
Moderator9 Volunteer moderator Send message Joined: 16 Feb 06 Posts: 251 Credit: 0 RAC: 0 |
Rosetta Beta 4.92 under Win 2000 SP 4 Mike, Correct me if I am wrong, but I thought I saw a post from you before indicating that you were running a BOINC version later than 5.2.13. If so you are correct that the error makes no sense. If you are running 5.2.13, then the Work Unit was removed from memory when the benchmark ran and that is why it errored out. Moderator9 RALPH@home FAQs RALPH@home Guidelines Moderator Contact |
UBT - Halifax--lad Send message Joined: 15 Feb 06 Posts: 29 Credit: 2,723 RAC: 0 |
well if that was the case would it not have errored out straight after the benchmarks?? But seen as though BOINC left it in memory at the benchmark that can't be the reason for the failure can it?
Join us in Chat (see the forum) Click the Sig Join UBT |
Message boards :
RALPH@home bug list :
Report - Previously Unclassified Work Unit Errors
©2024 University of Washington
http://www.bakerlab.org