Bug reports for 5.49-5.51

Message boards : RALPH@home bug list : Bug reports for 5.49-5.51

To post messages, you must log in.

Previous · 1 · 2 · 3 · Next

AuthorMessage
Rhiju
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 14 Feb 06
Posts: 161
Credit: 3,725
RAC: 0
Message 2832 - Posted: 5 Mar 2007, 18:43:06 UTC - in response to Message 2831.  

Yea, that's a big one -- I better not send it out again!

I'm tracking down a few other RNA workunits that have crashed. On the whole, things are looking good, though I'll probably need to run more tests, and do another app update this week.

This Wu was atleast at step 1.000.000 when it timed out!

https://ralph.bakerlab.org/result.php?resultid=444991

Anders n


ID: 2832 · Report as offensive    Reply Quote
Profile anders n

Send message
Joined: 16 Feb 06
Posts: 166
Credit: 131,419
RAC: 0
Message 2833 - Posted: 5 Mar 2007, 19:34:28 UTC

process exited with code 1 (0x1)


On all my new Wu-s like this one.

https://ralph.bakerlab.org/result.php?resultid=446166

Anders n
ID: 2833 · Report as offensive    Reply Quote
Rhiju
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 14 Feb 06
Posts: 161
Credit: 3,725
RAC: 0
Message 2834 - Posted: 5 Mar 2007, 20:31:01 UTC - in response to Message 2833.  

Thanks for continuing to post. It took a little effort to find this last bug in this set of new WUs -- I had to trap it on my laptop and read the stdout.txt. I think I fixed it, so I'm sending out a few new jobs.

process exited with code 1 (0x1)


On all my new Wu-s like this one.

https://ralph.bakerlab.org/result.php?resultid=446166

Anders n


ID: 2834 · Report as offensive    Reply Quote
genes
Avatar

Send message
Joined: 16 Feb 06
Posts: 45
Credit: 43,706
RAC: 20
Message 2835 - Posted: 6 Mar 2007, 3:11:51 UTC

Wow! The graphics are awesome! (not a bug) :-)

ID: 2835 · Report as offensive    Reply Quote
Profile feet1st

Send message
Joined: 7 Mar 06
Posts: 313
Credit: 116,623
RAC: 0
Message 2836 - Posted: 6 Mar 2007, 4:17:24 UTC - in response to Message 2828.  
Last modified: 6 Mar 2007, 4:18:35 UTC

Something is strange with these Wu-s.

0 decoys from 0 attempts

Me too... v5.50, 0 decoys and yet exactly 30 nstructs, just as "anders n", and it appears to have run for the standard 10,000 seconds rather then my 24hr preference.
https://ralph.bakerlab.org/result.php?resultid=445489

ID: 2836 · Report as offensive    Reply Quote
Profile anders n

Send message
Joined: 16 Feb 06
Posts: 166
Credit: 131,419
RAC: 0
Message 2837 - Posted: 6 Mar 2007, 13:03:40 UTC - in response to Message 2832.  

Yea, that's a big one -- I better not send it out again!


No problem with me to send it again. :)

If possibel the big ones should be sent to the "faster" computers.

Anders n

ID: 2837 · Report as offensive    Reply Quote
Profile anders n

Send message
Joined: 16 Feb 06
Posts: 166
Credit: 131,419
RAC: 0
Message 2839 - Posted: 7 Mar 2007, 6:35:04 UTC

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x008F2D4C read attempt to address 0x00000E51

On a number fo Wu-s

Like this one.

https://ralph.bakerlab.org/result.php?resultid=450894

Anders n

ID: 2839 · Report as offensive    Reply Quote
Viromancy

Send message
Joined: 20 Jan 07
Posts: 7
Credit: 1,425
RAC: 0
Message 2840 - Posted: 7 Mar 2007, 7:13:52 UTC

Out of seven WUs run under ver 5.50 so far, I've had three rapid failures within seconds of the run starting:

Two "Incorrect function. (0x1) - exit code 1 (0x1)" - 445054 and 446305

One access violation, which may have occurred when I tried to show graphics - 450818


ID: 2840 · Report as offensive    Reply Quote
j2satx

Send message
Joined: 17 Feb 06
Posts: 42
Credit: 168,797
RAC: 0
Message 2841 - Posted: 7 Mar 2007, 11:59:38 UTC

Computer Project Date ID Message
6100M902 ralph@home 3/7/2007 4:52:11 AM 518 Reason: Unrecoverable error for result DOCKING_1rhj_SYMM_11rhj_1_d.s036_bigrun.out.85_1826_4_1 (process exited with code 131 (0x83))

ID: 2841 · Report as offensive    Reply Quote
Michael Stoeter

Send message
Joined: 20 Feb 06
Posts: 1
Credit: 1,097,989
RAC: 0
Message 2842 - Posted: 7 Mar 2007, 13:50:28 UTC
Last modified: 7 Mar 2007, 13:53:44 UTC

From my last 47 WU ends 41 WU with Error
Exit status -1073741819 (0xffffffffc0000005)

https://ralph.bakerlab.org/workunit.php?wuid=398542
ID: 2842 · Report as offensive    Reply Quote
Viromancy

Send message
Joined: 20 Jan 07
Posts: 7
Credit: 1,425
RAC: 0
Message 2843 - Posted: 7 Mar 2007, 18:10:08 UTC

Another very rapid access violation in 5.50, this time without any attempt to view the graphics when the WU started: 451729

Half the WUs my machine has processed under 5.50 have now failed within seconds of starting. The "incorrect function" errors with the first two ab-initio RNA folding WUs seem to have stopped, but both of the access violation errors today have been with DOCKING_1rhj_SYMM_11rhj_1_d.s036_bigrun.out. units.

ID: 2843 · Report as offensive    Reply Quote
Ingemar
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 7 Mar 07
Posts: 9
Credit: 76
RAC: 0
Message 2844 - Posted: 7 Mar 2007, 22:19:40 UTC

Hi, the jobs named DOCK_SYMM unraveled a bug in the 5.50 release and all fail shortly after they appear. This problem will be fixed in the next release and no more jobs causing this problem will be submitted. Sorry for the inconvenience!
ID: 2844 · Report as offensive    Reply Quote
Profile feet1st

Send message
Joined: 7 Mar 06
Posts: 313
Credit: 116,623
RAC: 0
Message 2845 - Posted: 8 Mar 2007, 1:57:17 UTC
Last modified: 8 Mar 2007, 1:57:41 UTC

One comment on the graphics. As a new model begins, the graphic shows the strand and gradually scales down during the first few steps... as that happens, the text in the box scales down as well. With the DOC and HINGE WUs, those text labels (Searching, Low Energy, Native...) can get PRETTY small.
ID: 2845 · Report as offensive    Reply Quote
Michael.L

Send message
Joined: 26 Nov 06
Posts: 5
Credit: 1,173
RAC: 0
Message 2846 - Posted: 8 Mar 2007, 22:50:22 UTC
Last modified: 8 Mar 2007, 22:51:12 UTC

08/03/2007 22:46:10|ralph@home|Unrecoverable error for result 1ywz_1_NMRREF_1_1ywz_1_idid_model_12IGNORE_THE_REST_idl_1831_8_0 ( - exit code -1073741819 (0xc0000005))

WU ran for only 48 seconds before failing.

i think the new graphics are great, no problems.
ID: 2846 · Report as offensive    Reply Quote
Michael.L

Send message
Joined: 26 Nov 06
Posts: 5
Credit: 1,173
RAC: 0
Message 2847 - Posted: 9 Mar 2007, 0:11:16 UTC
Last modified: 9 Mar 2007, 0:13:24 UTC

08/03/2007 23:47:48|ralph@home|Unrecoverable error for result 1ywz_1_NMRREF_1_1ywz_1_idid_model_11IGNORE_THE_REST_idl_1831_8_0 ( - exit code -1073741819 (0xc0000005))

08/03/2007 23:58:39|ralph@home|Unrecoverable error for result 1ywz_1_NMRREF_1_1ywz_1_idid_model_01IGNORE_THE_REST_idl_1831_2_2 ( - exit code -1073741819 (0xc0000005))

The above two ran for about 46 and 47 seconds each
ID: 2847 · Report as offensive    Reply Quote
Michael.L

Send message
Joined: 26 Nov 06
Posts: 5
Credit: 1,173
RAC: 0
Message 2848 - Posted: 9 Mar 2007, 0:15:43 UTC
Last modified: 9 Mar 2007, 0:20:30 UTC

Sends feet1st my spare reading glasses.
ID: 2848 · Report as offensive    Reply Quote
[B^S] sTrey
Avatar

Send message
Joined: 15 Feb 06
Posts: 58
Credit: 15,430
RAC: 0
Message 2849 - Posted: 9 Mar 2007, 16:47:56 UTC
Last modified: 9 Mar 2007, 16:48:38 UTC

My last two wu's, one last night one this morning, both errored out almost immediately with:
- exit code -1073741819 (0xc0000005)
ERROR:: Unable to determine sequence length from pdb file

454165
453634
ID: 2849 · Report as offensive    Reply Quote
genes
Avatar

Send message
Joined: 16 Feb 06
Posts: 45
Credit: 43,706
RAC: 20
Message 2850 - Posted: 10 Mar 2007, 2:40:23 UTC

Some more access violations...

453604
453706
453803
454246

none of them ran longer than 2 minutes.

ID: 2850 · Report as offensive    Reply Quote
Thomas Leibold

Send message
Joined: 25 Feb 07
Posts: 27
Credit: 77,464
RAC: 0
Message 2851 - Posted: 10 Mar 2007, 4:26:52 UTC
Last modified: 10 Mar 2007, 4:27:40 UTC

400678

400776

400748

On the same workunits that windows users are getting "exit code -1073741819 (0xc0000005)" and "- Unhandled Exception Record - Reason: Access Violation (0xc0000005)" I'm getting on Linux "process exited with code 131 (0x83)" and a segmentation violation:

ERROR:: Unable to determine sequence length from pdb file
# random seed: 2719130
SIGSEGV: segmentation violation
Stack trace (13 frames):
[0x8b95623]
[0x8bb146c]
[0xffffe420]
[0x8857337]
[0x861bad6]
[0x8620366]
[0x86222e3]
[0x8973172]
[0x8529c73]
[0x8641c32]
[0x8641cdc]
[0x8c10a94]
[0x8048111]

Exiting...
ID: 2851 · Report as offensive    Reply Quote
RickH

Send message
Joined: 10 Aug 06
Posts: 5
Credit: 7,260
RAC: 0
Message 2852 - Posted: 10 Mar 2007, 4:40:21 UTC
Last modified: 10 Mar 2007, 4:59:43 UTC

Just noticed WU ID 400842, DOC_1DFJ_R070309_pose_b_pert_fixbb_score12_1832_6 has been stuck doing no work for over an hour, while the RALPH science app repeatedly tries to directly connect to 207.46.212.122:80 for some reason.

I don't know why a science app would be directly using the internet connection instead of letting BOINC handle the file transfers like usual, but it's not a good idea. My software firewall (Comodo) is set to require individual app-by-app approval for internet access (to prevent trojans from phoning home), and since rosetta_beta_5.50_windows_intelx86.exe is not on the approved list, it keeps denying the program's access and the app is apparently stuck spinning its wheels. Currently at 20+ access errors logged and counting, repeated every 4 minutes or so.

I could of course approve the app for internet access, but that won't work long term, since soon enough it'll be upgraded to rosetta_beta_5.51... and the game will start all over again. Even if I wanted to do this, each app upgrade will result in hours of wasted CPU time waiting for me to notice each new version's approval request popups, along with filling up the firewall's access list with dozens of sequential app names. Blech.

I guess I'll have to approve this one, since it's either that or abort it, but 5.51 needs to either not require direct internet access, or at least fail gracefully if it's denied.

ID: 2852 · Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · Next

Message boards : RALPH@home bug list : Bug reports for 5.49-5.51



©2024 University of Washington
http://www.bakerlab.org