Bug reports for 5.49-5.51

Message boards : RALPH@home bug list : Bug reports for 5.49-5.51

To post messages, you must log in.

Previous · 1 · 2 · 3

AuthorMessage
RickH

Send message
Joined: 10 Aug 06
Posts: 5
Credit: 7,260
RAC: 0
Message 2853 - Posted: 10 Mar 2007, 4:50:06 UTC
Last modified: 10 Mar 2007, 4:52:06 UTC

Figures. I just approved the internet access, and it immediately crashes with the 0xC0000005 fault that's going around. Oh, well.

454295
ID: 2853 · Report as offensive    Reply Quote
Profile feet1st

Send message
Joined: 7 Mar 06
Posts: 313
Credit: 116,623
RAC: 0
Message 2854 - Posted: 10 Mar 2007, 5:53:29 UTC - in response to Message 2852.  

...why a science app would be directly using the internet connection instead of letting BOINC handle the file transfers...


See discussion here: https://boinc.bakerlab.org/forum_thread.php?id=1755&nowrap=true#32219

I could of course approve the app for internet access, but that won't work long term, since soon enough it'll be upgraded to rosetta_beta_5.51... and the game will start all over again.


Yep. The good news, if you want to call it that, is that the task had failed prior to the firewall challenge. So, the application not being approved in the firewall is not what caused it to fail.
ID: 2854 · Report as offensive    Reply Quote
RickH

Send message
Joined: 10 Aug 06
Posts: 5
Credit: 7,260
RAC: 0
Message 2855 - Posted: 10 Mar 2007, 11:59:57 UTC

Oh, I see. What a pain. My firewall doesn't support app name wildcards, and there's no way to proactively say "any app that wants to connect to x.y.z.t is allowed, even if you've never heard of it before." The app approval seems to be implemented as a separate layer; first the app is checked to see if it's allowed to use the internet at all, then if so, the packets are run through the packet-level rules as they go out.

It looks like I'm stuck with losing many hours of CPU time the first time any new version of the science app aborts, along with it stuffing the error log and approval list full of spam. Argh.

ID: 2855 · Report as offensive    Reply Quote
Rhiju
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 14 Feb 06
Posts: 161
Credit: 3,725
RAC: 0
Message 2857 - Posted: 12 Mar 2007, 7:16:02 UTC - in response to Message 2851.  
Last modified: 12 Mar 2007, 7:17:12 UTC

I've contacted Vatsan -- hopefully he'll reply about what's wrong with these workunits tomorrow.
In the meanwhile, application is updated to 5.51! Not many changes this time, just a very slight fix to allow us to send out symmettric docking work units when we don't really know the native structure.


400678

400776

400748

On the same workunits that windows users are getting "exit code -1073741819 (0xc0000005)" and "- Unhandled Exception Record - Reason: Access Violation (0xc0000005)" I'm getting on Linux "process exited with code 131 (0x83)" and a segmentation violation:

ERROR:: Unable to determine sequence length from pdb file
# random seed: 2719130
SIGSEGV: segmentation violation
Stack trace (13 frames):
[0x8b95623]
[0x8bb146c]
[0xffffe420]
[0x8857337]
[0x861bad6]
[0x8620366]
[0x86222e3]
[0x8973172]
[0x8529c73]
[0x8641c32]
[0x8641cdc]
[0x8c10a94]
[0x8048111]

Exiting...


ID: 2857 · Report as offensive    Reply Quote
Profile Conan
Avatar

Send message
Joined: 16 Feb 06
Posts: 364
Credit: 1,368,421
RAC: 0
Message 2858 - Posted: 12 Mar 2007, 7:35:15 UTC

Faulty WU, exit code 131

https://ralph.bakerlab.org/workunit.php?wuid=400654
ID: 2858 · Report as offensive    Reply Quote
Vatsan
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 12 Mar 07
Posts: 1
Credit: 0
RAC: 0
Message 2859 - Posted: 12 Mar 2007, 17:41:38 UTC

WU : 40678, 40776 etc I am sorry the WUs crashed. I tested the jobs on my desktop before sending it out to Ralph. There was a small error in renumbering the sequence number in the PDB structure. It ran on my desktop despite this discrepancy. I've fixed it and resubmitted the jobs and they ran cleanly. Sorry for the inconvenience.
ID: 2859 · Report as offensive    Reply Quote
Viromancy

Send message
Joined: 20 Jan 07
Posts: 7
Credit: 1,425
RAC: 0
Message 2860 - Posted: 12 Mar 2007, 19:46:33 UTC

Very short runtime in 5.51 for an abinitio RNA WU that generated 0 decoys from 0 attempts, followed by a validation error

https://ralph.bakerlab.org/result.php?resultid=456095

ID: 2860 · Report as offensive    Reply Quote
[B^S] sTrey
Avatar

Send message
Joined: 15 Feb 06
Posts: 58
Credit: 15,430
RAC: 0
Message 2861 - Posted: 14 Mar 2007, 2:38:00 UTC
Last modified: 14 Mar 2007, 2:41:08 UTC

This result has been running for close to 6 hours, is still racking up cpu time and says it's at 1%. My preference settings are for 4 hours. I just suspended it and I have to reboot for a Windows Update. If it doesn't seem more sane after that I'll abort it, unless advised to let it run.
ID: 2861 · Report as offensive    Reply Quote
Rhiju
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 14 Feb 06
Posts: 161
Credit: 3,725
RAC: 0
Message 2862 - Posted: 14 Mar 2007, 4:17:28 UTC - in response to Message 2861.  

No, it looks like a lot of users have not been able to return results for the WU due to timeouts. I'm sending some out again that require less computation, let's see how those go.

This result has been running for close to 6 hours, is still racking up cpu time and says it's at 1%. My preference settings are for 4 hours. I just suspended it and I have to reboot for a Windows Update. If it doesn't seem more sane after that I'll abort it, unless advised to let it run.


ID: 2862 · Report as offensive    Reply Quote
Rolly

Send message
Joined: 7 May 06
Posts: 2
Credit: 24,104
RAC: 0
Message 2863 - Posted: 14 Mar 2007, 9:17:46 UTC

This workunit has been running for three hours and is stil initializing. It seems to be running fine with almost 5 million steps calculated and moving graphs of folding rna. But I am worried about the lack of progress.

Jorn
ID: 2863 · Report as offensive    Reply Quote
BdP

Send message
Joined: 5 Mar 07
Posts: 1
Credit: 193
RAC: 0
Message 2864 - Posted: 14 Mar 2007, 14:40:52 UTC

You might wanna check this wu type: 1xjr__BOINC_INCREASE_CYCLES_RNA_ABINITIO-1xjr_-_1843_13_0. It might generate an infinite loop...I mean it runs for over 1 hour and a half (in my prefs I've selected 1 h target time), and it's at step no. 1700000 while the stage still states "initializing"....I'm gonna abort it now.
ID: 2864 · Report as offensive    Reply Quote
Rhiju
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 14 Feb 06
Posts: 161
Credit: 3,725
RAC: 0
Message 2865 - Posted: 14 Mar 2007, 19:04:16 UTC - in response to Message 2864.  

Thanks for posting. There was a definite problem with an early round of these workunits. Now, most of these workunits have now been returning fine, but I'll make sure.

[BTW, I'm fixing the "Initializing..." bug, and it will go out on the next application update.]

You might wanna check this wu type: 1xjr__BOINC_INCREASE_CYCLES_RNA_ABINITIO-1xjr_-_1843_13_0. It might generate an infinite loop...I mean it runs for over 1 hour and a half (in my prefs I've selected 1 h target time), and it's at step no. 1700000 while the stage still states "initializing"....I'm gonna abort it now.


ID: 2865 · Report as offensive    Reply Quote
Previous · 1 · 2 · 3

Message boards : RALPH@home bug list : Bug reports for 5.49-5.51



©2024 University of Washington
http://www.bakerlab.org