Bug reports for Ralph 5.17-5.19

Message boards : RALPH@home bug list : Bug reports for Ralph 5.17-5.19

To post messages, you must log in.

Previous · 1 · 2 · 3 · Next

AuthorMessage
tralala

Send message
Joined: 12 Apr 06
Posts: 52
Credit: 15,257
RAC: 0
Message 1730 - Posted: 1 Jun 2006, 9:37:57 UTC - in response to Message 1728.  
Last modified: 1 Jun 2006, 9:52:19 UTC

While a progress of an unit reached 100% on my BOINC client, strangely it continues to work, and % of completed on the graphic keeps less than 100%, approximately 57%.
(here you can see a screen shot of it.) And I continue to work on the unit several minutes, but it remains at the same percentage.

Is this bug or not? thanks,



I think this is related to the phenomenon Mike Gelvin was reporting that some WU remain dormant after showing 100%. Perhaps you can check the task manager for the next WU.

ID: 1730 · Report as offensive    Reply Quote
tralala

Send message
Joined: 12 Apr 06
Posts: 52
Credit: 15,257
RAC: 0
Message 1731 - Posted: 1 Jun 2006, 9:38:59 UTC - in response to Message 1730.  
Last modified: 1 Jun 2006, 9:54:58 UTC

Instant failure, no graphics displayed but two preempted WU from Rosetta in mem:
https://ralph.bakerlab.org/result.php?resultid=145667

Have two more of those WU, will watch them closely.

How can I turn debug-information on?

ID: 1731 · Report as offensive    Reply Quote
tralala

Send message
Joined: 12 Apr 06
Posts: 52
Credit: 15,257
RAC: 0
Message 1732 - Posted: 1 Jun 2006, 10:53:43 UTC - in response to Message 1731.  

Instant failure, no graphics displayed but two preempted WU from Rosetta in mem:
https://ralph.bakerlab.org/result.php?resultid=145667

Have two more of those WU, will watch them closely.

How can I turn debug-information on?


The next two WUs failed as well with the same error code after the same time (6sec):

https://ralph.bakerlab.org/result.php?resultid=145641
https://ralph.bakerlab.org/result.php?resultid=145639

ID: 1732 · Report as offensive    Reply Quote
tralala

Send message
Joined: 12 Apr 06
Posts: 52
Credit: 15,257
RAC: 0
Message 1733 - Posted: 1 Jun 2006, 11:03:50 UTC - in response to Message 1730.  

While a progress of an unit reached 100% on my BOINC client, strangely it continues to work, and % of completed on the graphic keeps less than 100%, approximately 57%.
(here you can see a screen shot of it.) And I continue to work on the unit several minutes, but it remains at the same percentage.

Is this bug or not? thanks,



I think this is related to the phenomenon Mike Gelvin was reporting that some WU remain dormant after showing 100%. Perhaps you can check the task manager for the next WU.


I can confirm this bug reported from suguruhirahara and Mike Gelvin. This WU:
https://ralph.bakerlab.org/result.php?resultid=145529

finished after 54 minutes and BOINC reported 100% but still running and in the task manager 0% CPU-ulilization was reported. This lasted for about 5 minutes until it finished. I checked the screensaver and the progress was 89% and it was shown that it had started a 4th model but which in fact seemed not to be calculated. Here is the screenshot
ID: 1733 · Report as offensive    Reply Quote
suguruhirahara

Send message
Joined: 5 Mar 06
Posts: 40
Credit: 11,320
RAC: 0
Message 1734 - Posted: 1 Jun 2006, 11:12:10 UTC - in response to Message 1732.  
Last modified: 1 Jun 2006, 11:12:53 UTC

Instant failure, no graphics displayed but two preempted WU from Rosetta in mem:
https://ralph.bakerlab.org/result.php?resultid=145667

Have two more of those WU, will watch them closely.

How can I turn debug-information on?


The next two WUs failed as well with the same error code after the same time (6sec):

https://ralph.bakerlab.org/result.php?resultid=145641
https://ralph.bakerlab.org/result.php?resultid=145639
I got the same kind of error over whole fetched workunits such as,
https://ralph.bakerlab.org/result.php?resultid=145741
https://ralph.bakerlab.org/result.php?resultid=145770
ID: 1734 · Report as offensive    Reply Quote
WTBroughton

Send message
Joined: 17 Feb 06
Posts: 1
Credit: 97,755
RAC: 38
Message 1735 - Posted: 1 Jun 2006, 11:19:31 UTC

Just re-attached to project, 3 WU's failed after 12 secs.

BOINCLogX reports Incorrect function. (0x1) - exit code 1 (0x1); ERROR:: Exit at: .barcode_classes.cc line:500

WTBroughton
BOINC in Northumberland
ID: 1735 · Report as offensive    Reply Quote
wizzszz

Send message
Joined: 28 Apr 06
Posts: 17
Credit: 1,128
RAC: 0
Message 1736 - Posted: 1 Jun 2006, 11:58:46 UTC

All WUs I tried crashed within the first seconds.


Unrecoverable error for result t296__CASP7_ABINITIO_SAVE_ALL_OUT_nterm_hom009__612_6_0 (Unzulässige Funktion. (0x1) - exit code 1 (0x1))

Unrecoverable error for result t296__CASP7_ABINITIO_SAVE_ALL_OUT_cterm_hom017__610_9_0 (Unzulässige Funktion. (0x1) - exit code 1 (0x1))

Unrecoverable error for result t296__CASP7_ABINITIO_SAVE_ALL_OUT_cterm_hom001__610_3_1 (Unzulässige Funktion. (0x1) - exit code 1 (0x1))

Unrecoverable error for result t296__CASP7_ABINITIO_SAVE_ALL_OUT_cterm_hom001__612_10_0 (Unzulässige Funktion. (0x1) - exit code 1 (0x1))


ID: 1736 · Report as offensive    Reply Quote
Mike Gelvin
Avatar

Send message
Joined: 17 Feb 06
Posts: 50
Credit: 55,397
RAC: 0
Message 1737 - Posted: 1 Jun 2006, 15:06:20 UTC
Last modified: 1 Jun 2006, 15:07:55 UTC

I confirm earlier reported error out condition:

Incorrect function. (0x1) - exit code 1 (0x1); ERROR:: Exit at: .barcode_classes.cc line:500

in these two work units:

146315
145904

AND

after observing several more dormant cases, I conclude that the application goes dormant until the wall clock time reaches an even hour after the app starts before terminating "normally". This time is TO THE SECOND. This is independent of the CPU time taken.
ID: 1737 · Report as offensive    Reply Quote
dainenyu

Send message
Joined: 19 Feb 06
Posts: 6
Credit: 7,772
RAC: 0
Message 1738 - Posted: 1 Jun 2006, 17:12:35 UTC
Last modified: 1 Jun 2006, 17:53:02 UTC

I received 12 WUs using 5.19. Eleven t296__CASP7_ABINITIO_* and one t299__CASP7_ABRELAX. The ABRELAX is running fine (was able to suspend/resume with no problems), but all of the ABINITIOs errored out during initialization.

For the ABINITIOS:
<core_client_version>5.4.9</core_client_version>
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
# cpu_run_time_pref: 3600
ERROR:: Exit at: .barcode_classes.cc line:500

</stderr_txt>



Huh. While I was watching the ABRELAX, it whizzed along to step 340000, seemed to pause for a bit, and is now moving very slowly. And the "to completion time" keeps increasing. CPU keeps working, though.

Update: At 40 min (66.6% completion according to the graphic, 100% by the client), the ABRELAX unit seems to have gone dormant. Stopped at step 340539 in stage 2, not using any CPU. I'll keep an eye on it to see if/when it resumes.

Update2: Or...it really was finished? It exited with no problems and is valid. Never mind about that one then.
ID: 1738 · Report as offensive    Reply Quote
Profile Carlos_Pfitzner
Avatar

Send message
Joined: 16 Feb 06
Posts: 182
Credit: 22,792
RAC: 0
Message 1739 - Posted: 1 Jun 2006, 18:37:10 UTC

Rosetta_beta 5.19 Windows

Exit status 1 (0x1)

Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
ERROR:: Exit at: .barcode_classes.cc line:500


https://ralph.bakerlab.org/result.php?resultid=146649
https://ralph.bakerlab.org/result.php?resultid=146648
Click signature for global team stats
ID: 1739 · Report as offensive    Reply Quote
[B^S] sTrey
Avatar

Send message
Joined: 15 Feb 06
Posts: 58
Credit: 15,430
RAC: 0
Message 1740 - Posted: 1 Jun 2006, 18:41:16 UTC
Last modified: 1 Jun 2006, 18:43:19 UTC

Ditto, same error shown in this result.

Errored out immediately (cpu = 14 seconds) with the exit code 1 @ barcode_classes.cc line:500

app is 5.19, client 5.4.9, windows XPsp2
ID: 1740 · Report as offensive    Reply Quote
Rhiju
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 14 Feb 06
Posts: 161
Credit: 3,725
RAC: 0
Message 1741 - Posted: 1 Jun 2006, 19:51:15 UTC - in response to Message 1740.  

Thanks for posting about this! I've fixed the problem, and I'm sending out the workunit again.

Ditto, same error shown in this result.

Errored out immediately (cpu = 14 seconds) with the exit code 1 @ barcode_classes.cc line:500

app is 5.19, client 5.4.9, windows XPsp2


ID: 1741 · Report as offensive    Reply Quote
Rhiju
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 14 Feb 06
Posts: 161
Credit: 3,725
RAC: 0
Message 1742 - Posted: 1 Jun 2006, 19:57:05 UTC - in response to Message 1726.  

Hi: this behavior occurs for some work units (the ones with "RELAX" in the name). They should end up completing if you wait an hour. If the workunit is going beyond four times your preference (4x1 hours on ralph by default; 4x3 hours on rosetta@home by default), go ahead and abort it. I've been paying close attention to WU's on my powerbook (Mac OS 10.4.2) -- the simulations do run slower on a Mac than on Windows (for now), but I've never seen a workunit totally get stuck.

rosetta_beta version 518
WU Name: JUMP_RELAX_3142__t285__SAVE_ALL_OUT_594_22_0
running on Mac OS 10.4.6

BOINC Manager Tasks tab shows CPU Time stuck at 08:53:24 and 70%
top command shows TIME = 36:04:48 and climbing

stopped and restarted BOINC
CPU Time reverted to 05:35:00 and 70% but no longer stuck


ID: 1742 · Report as offensive    Reply Quote
doc :)

Send message
Joined: 16 Feb 06
Posts: 46
Credit: 4,437
RAC: 0
Message 1743 - Posted: 1 Jun 2006, 20:56:36 UTC

got some interesting 5.19 stuff too, was not infront of my pc to see it happen mostly though.
the first result took less than 2 hours (my ralph preference) to run according to the resultpage, in realtime it took almost exactly 2 hours to run.
next result took just a little less than 2 hours of processing time looking at the result page, but took in fact 3 hours realtime.
the third result took almost exactly 2 hours on the result page, but consumed 3 hours of realtime too.
fourth was a little more complicated, it was running for 3 hours when it got preempted, cputime in the boincmanager was showing 1:59:07, the grahpics the same cpu time, 99.2% in the graphics, but 100% in boincmanager. upon being resumed it instantly completed.
last i got ended with incorrect function right at the start. result
ID: 1743 · Report as offensive    Reply Quote
wizzszz

Send message
Joined: 28 Apr 06
Posts: 17
Credit: 1,128
RAC: 0
Message 1744 - Posted: 2 Jun 2006, 13:28:50 UTC
Last modified: 2 Jun 2006, 13:47:31 UTC

I think I found the problem causing the missing/broken graphic!!!!


I am right now running following wu
FRA_t301_hom029_1_LOOPRLX_IGNORE_THE_REST__hom029_1_1bwzA__100_627_5_0 using rosetta_beta version 519

It didn't display ANY of the three graphic... I played around with the mouse and suddenly got some graphic in the low energy window!

I can rotate it with the mouse, so that it is completely rotated out of the window, and i can zoom it that big, that parts of the graphic are missing!!!

As you can see on the third picture, the both other graphs have the same problem!

I think, it is just the rotation center point that must be placed in the center of the model... Seems currently to be much behind the model....

That would also fix the broken graph:
Parts of the graphic vanish, because the model moves "out" of the display spa ce!

Let me know if my poor english explanation did point the problem out, or if you have further questions!

Update1: later in this model, the error vanishes, I think because the graphics change in a way that fixes the center point..




ID: 1744 · Report as offensive    Reply Quote
Pieface

Send message
Joined: 16 Feb 06
Posts: 64
Credit: 203,513
RAC: 0
Message 1745 - Posted: 2 Jun 2006, 13:48:36 UTC

I ran into one of those 'stuck' abrelax wu's overnite:
resid 144202
running ralph 5.19 on my P-M laptop. It showed 100% done at 3:14:58 CPU time, but no activity, screensaver wouldn't come up (blank area only), task manager showed all cycles going to System Idle Process. Suspended acty and re-started Boinc, no-joy. Closed BM and re-started machine, brought it back up, no-joy. Aborted the WU and BM went right back to work. Didn't want to wait ?16hrs? (4x4) to see if it would somehow wake-up and finish on its own!
ID: 1745 · Report as offensive    Reply Quote
wizzszz

Send message
Joined: 28 Apr 06
Posts: 17
Credit: 1,128
RAC: 0
Message 1746 - Posted: 2 Jun 2006, 14:20:18 UTC

The wu mentioned below did show the same behaviour:

BOINC reported 100%, GUI about 98! No CPU-usage!

Had a similar problem with some HashClash units, maybe it is a problem of the
BOINC 5.4.9 client..?
ID: 1746 · Report as offensive    Reply Quote
Mike Gelvin
Avatar

Send message
Joined: 17 Feb 06
Posts: 50
Credit: 55,397
RAC: 0
Message 1747 - Posted: 2 Jun 2006, 15:36:17 UTC - in response to Message 1745.  
Last modified: 2 Jun 2006, 15:36:50 UTC

I ran into one of those 'stuck' abrelax wu's overnite:
resid 144202
running ralph 5.19 on my P-M laptop. It showed 100% done at 3:14:58 CPU time, but no activity, screensaver wouldn't come up (blank area only), task manager showed all cycles going to System Idle Process. Suspended acty and re-started Boinc, no-joy. Closed BM and re-started machine, brought it back up, no-joy. Aborted the WU and BM went right back to work. Didn't want to wait ?16hrs? (4x4) to see if it would somehow wake-up and finish on its own!


My experiance with this is that it will always finish within one hour of getting into this mode. If you look at the messages, and see when the WU started, take the minutes and seconds portion (like 5:10:20). When your PC time (wall time) reaches the same minutes and seconds (like 7:10:20), the work unit will complete. I know its weird, but thats what I am seeing.

Mike

ID: 1747 · Report as offensive    Reply Quote
doc :)

Send message
Joined: 16 Feb 06
Posts: 46
Credit: 4,437
RAC: 0
Message 1748 - Posted: 2 Jun 2006, 16:42:45 UTC

thats definetley something that has to be fixed before it goes out on the main project. out of the 4 5.19 units that i completed succesfully (2 hour pref) 3 wasted 1 hour that could have been used elsewhere, and the 4th used at least a couple of extra minutes to reach the 2 hour mark.
ID: 1748 · Report as offensive    Reply Quote
Profile Fuzzy Hollynoodles
Avatar

Send message
Joined: 19 Feb 06
Posts: 37
Credit: 2,089
RAC: 0
Message 1749 - Posted: 2 Jun 2006, 18:31:09 UTC - in response to Message 1747.  

I ran into one of those 'stuck' abrelax wu's overnite:
resid 144202
running ralph 5.19 on my P-M laptop. It showed 100% done at 3:14:58 CPU time, but no activity, screensaver wouldn't come up (blank area only), task manager showed all cycles going to System Idle Process. Suspended acty and re-started Boinc, no-joy. Closed BM and re-started machine, brought it back up, no-joy. Aborted the WU and BM went right back to work. Didn't want to wait ?16hrs? (4x4) to see if it would somehow wake-up and finish on its own!


My experiance with this is that it will always finish within one hour of getting into this mode. If you look at the messages, and see when the WU started, take the minutes and seconds portion (like 5:10:20). When your PC time (wall time) reaches the same minutes and seconds (like 7:10:20), the work unit will complete. I know its weird, but thats what I am seeing.

Mike


Yes, this is really weird.

I have one stuck right now: https://ralph.bakerlab.org/workunit.php?wuid=129926

I took this screendump of the BOINC Manager: http://img.photobucket.com/albums/v680/lsh55/Setiweb/Ralph/BOINC_Mgr_Ralph.jpg


And this is the graphic: http://img.photobucket.com/albums/v680/lsh55/Setiweb/Ralph/RalphWU.jpg

This is really weird, the screen is rolling very fast to the left and this "band" is showing at the bottom of the screen.


I'll leave it as it is and see if it will finish as you say at the minute. Else I'll abort it.


[color=navy][b]"I'm trying to maintain a shred of dignity in this world." - Me[/b][/color]

ID: 1749 · Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · Next

Message boards : RALPH@home bug list : Bug reports for Ralph 5.17-5.19



©2024 University of Washington
http://www.bakerlab.org