Message boards : RALPH@home bug list : 64bit app just added.
Author | Message |
---|---|
FluffyChicken Send message Joined: 17 Feb 06 Posts: 54 Credit: 710 RAC: 0 |
I don't see a post about this, but how are you implementing it. I know its a copy of the 32 bit but what identifier is it. The official (or soon to be) is x86_64 , I know some projects do not use that and will need to change with the newer server code. |
FluffyChicken Send message Joined: 17 Feb 06 Posts: 54 Credit: 710 RAC: 0 |
ok I just checked the download page and it' x86_64 :) both linux and windows I see. |
PieBandit Send message Joined: 3 May 07 Posts: 9 Credit: 7,592 RAC: 0 |
Has anyone witha 64 bit linux machine gotten any work units? I just get messages 'No Work From Project' |
JKeck {pirate} Send message Joined: 16 Feb 06 Posts: 14 Credit: 153,095 RAC: 0 |
I had several RPCs that gave no message, and no work. The most recent RPC gave the message "no work from project" and has 2 hours on the defferal. This is winxp 64. BOINC WIKI BOINCing since 2002/12/8 |
Trog Dog Send message Joined: 8 Aug 06 Posts: 38 Credit: 41,996 RAC: 0 |
Has anyone witha 64 bit linux machine gotten any work units? I just get messages 'No Work From Project' Yep, each of my 64bit boxes grabbed a wu. Remember RALPH doesn't always have work. |
dekim Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 20 Jan 06 Posts: 250 Credit: 543,579 RAC: 0 |
So are there any issues with the 64bit apps or shall I go ahead and release them on Rosetta@home? |
PieBandit Send message Joined: 3 May 07 Posts: 9 Credit: 7,592 RAC: 0 |
I was able to download the app, but never got work units. I had wanted to test because right now I am attempting to run Rosetta with an anonymous client (the 32 bit app) and it's failing mid work unit. Time spent working on the unit does not increase, % complete does not increase, but state goes from waiting to running to waiting as other jobs take their share of my processor. Would it be possible for you to release more work units on the 64 bit app, if you suspect it is an issue that would effect the 64 bit clients? You can look at my results here if they might provide some insight: https://boinc.bakerlab.org/rosetta/results.php?userid=165650 My setup: AMD64 X2, kubuntu 7.04, 2G ram |
JKeck {pirate} Send message Joined: 16 Feb 06 Posts: 14 Credit: 153,095 RAC: 0 |
So are there any issues with the 64bit apps or shall I go ahead and release them on Rosetta@home? I have a few on win x86_64 that returned successfully and were granted credit. So it is probably ok. One point though, if the app is simply the 32 bit app renamed then it may be a better solution to update the server instead of trying to build and release a seperate app. Server versions later than 4/30/7 should automatically send the 32 bit app to official 64 bit clients. This does not help the *nix boxes now though, there is still not an official 64bit test app for *nix yet. I would expect there to be by the time it is actually released as 5.10.x. BOINC WIKI BOINCing since 2002/12/8 |
Stefan Ledwina Send message Joined: 16 Feb 06 Posts: 2 Credit: 155,221 RAC: 0 |
Got 4 WUs todayand all are running fine. One of them is ready crunched and is valid... This is my 64bit Linux Host: https://ralph.bakerlab.org/show_host_detail.php?hostid=7960 |
PieBandit Send message Joined: 3 May 07 Posts: 9 Credit: 7,592 RAC: 0 |
I got 3, and one of them is complete apparently, successfully <https://ralph.bakerlab.org/result.php?resultid=502025>. So looking forward to 64 bit rosetta! |
dekim Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 20 Jan 06 Posts: 250 Credit: 543,579 RAC: 0 |
great, sounds like we can release them on R@h. We do realize the next boinc client update will be smart enough to get the 32 bit app if 64 doesn't exist but what we are currently doing is simple enough. We will update our server soon, though, for other scheduling benefits. |
JKeck {pirate} Send message Joined: 16 Feb 06 Posts: 14 Credit: 153,095 RAC: 0 |
I may have spoken too soon. Host has gotten about half compute errors. One of them can be ignored since it occured during a core client upgrade, however they all show the same error messages. BOINC WIKI BOINCing since 2002/12/8 |
PieBandit Send message Joined: 3 May 07 Posts: 9 Credit: 7,592 RAC: 0 |
I may have spoken too soon. Host has gotten about half compute errors. One of them can be ignored since it occured during a core client upgrade, however they all show the same error messages. If you look at them, it happened to all computers on those work units. Some were not on 64 bit. My other two work units are still running, no info yet |
Trog Dog Send message Joined: 8 Aug 06 Posts: 38 Credit: 41,996 RAC: 0 |
No probs here with the 64bit app - haven't checked graphics though. |
JKeck {pirate} Send message Joined: 16 Feb 06 Posts: 14 Credit: 153,095 RAC: 0 |
@PieBandit Thanks for noticing that I did not look at what the other hosts did. @Trog Dog I have run the graphics, no related problems observed. BOINC WIKI BOINCing since 2002/12/8 |
PieBandit Send message Joined: 3 May 07 Posts: 9 Credit: 7,592 RAC: 0 |
my initial 3 work units completed, thought they have this in the result: https://ralph.bakerlab.org/result.php?resultid=502647 <core_client_version>5.4.11</core_client_version> <stderr_txt> Graphics are disabled due to configuration... # cpu_run_time_pref: 3600 # random seed: 2680278 SIGSEGV: segmentation violation Graphics are disabled due to configuration... # cpu_run_time_pref: 3600 SIGABRT: abort called Stack trace (14 frames): [0x8cbe15f] [0x8cb8fac] [0xffffe500] [0x8d29014] [0x8d3dece] [0x8d430f1] [0x8d432c7] [0x8d134f5] [0x83d66c0] [0x8d2951f] [0x8cbac23] [0x8cbc1f3] [0x8cb522d] [0x8d5567a] Exiting... Graphics are disabled due to configuration... # cpu_run_time_pref: 3600 ====================================================== DONE :: 1 starting structures 2883.67 cpu seconds This process generated 4 decoys from 4 attempts ====================================================== BOINC :: Watchdog shutting down... BOINC :: BOINC support services shutting down... </stderr_txt> should that really be valid? I also recieved this error on a new WU: https://ralph.bakerlab.org/result.php?resultid=506041 |
Thomas Leibold Send message Joined: 25 Feb 07 Posts: 27 Credit: 77,464 RAC: 0 |
It looks like your client produced 4 models (possibly before the error ?) and therefore should get credit for those. The issue you mention in your previous post ('running' state, but cpu time not increasing) sounds like something I see when the preference "leave application in memory" is set to "no" and Boinc switches tasks between different projects or suspends a task because of other activity on the system. |
PieBandit Send message Joined: 3 May 07 Posts: 9 Credit: 7,592 RAC: 0 |
I've set the property to yes and will see if that fixes it. I've found a temporary workaround of manually suspending the task for several hours. It usually picks up again after I resume it. |
PieBandit Send message Joined: 3 May 07 Posts: 9 Credit: 7,592 RAC: 0 |
This doesn't seem to be the case. the process isn't even running (when rosetta claims to be running, top doesn't show it. it shows all other tasks). I'm very confused now |
Thomas Leibold Send message Joined: 25 Feb 07 Posts: 27 Credit: 77,464 RAC: 0 |
This doesn't seem to be the case. the process isn't even running (when rosetta claims to be running, top doesn't show it. it shows all other tasks). I'm very confused now With the 'keep application in memory' option on this is rare, but still seems to happen occassionally. I recently had one workunit in that state with the 32-bit linux client. What appears to be happening is that during an abnormal shutdown the communication between the processes for the task (you may have seen that each rosetta task uses 4 processes of which one does all the computation and therefore consumes most cpu time) gets in a state where they wait for each other indefinitely. Suspending and resuming the task in that situation does not help (except that Boinc will run another task while this stuck one is suspended). The rosetta watchdog timer doesn't help either since it is one of the processes involved in the communications deadlock. Boinc itself is unaware of the problem since the processes still exist (the fact that they don't actually consume cpu cycles isn't something the boinc client monitors, nor could it do that without introducing dependencies on the varies project clients). Stopping and restarting the entire boinc client with all project tasks will restart such a workunit at the last checkpoint before it encountered the problem. If the problem is repeatable the workunit will be processed until it once again reaches the trouble spot and will again stop consuming cpu time. In such a case the only remedy is to abort the workunit. |
Message boards :
RALPH@home bug list :
64bit app just added.
©2024 University of Washington
http://www.bakerlab.org