Message boards : Current tests : Test spotted
Author | Message |
---|---|
Mr P Hucker Send message Joined: 3 Mar 23 Posts: 31 Credit: 9,510 RAC: 3 |
Pity this place is full of spam. Here's a real post. I got 1 task on each of 2 machines. 5GB vdi image downloaded for Python. 1st machine said download failure, not sure why. Computer may have crashed of it's own accord, it has a dodgy old GPU. https://ralph.bakerlab.org/results.php?hostid=49339 2nd machine got it ok, but caused a computation error immediately. https://ralph.bakerlab.org/results.php?hostid=48821 |
aendgraend Send message Joined: 24 Oct 08 Posts: 5 Credit: 330,528 RAC: 0 |
My Host 37110 got several WUs one of which is running 21+ hrs right now. AFAIK one Workunit always completes at least one model, so my Question is, are those running times expected? If yes, maybe the Deadline should be adjusted, as the other WUS waiting here on my host won't have even started until the Deadline is over (so I will sadly need to cancel them which I normally won't do). Here's the Details of the current WU (which is already on 100% progress since yesterday evening)
|
Fardringle Send message Joined: 22 Feb 06 Posts: 18 Credit: 360,436 RAC: 1,901 |
I received a total of 12 tasks from this test batch. Five completed successfully. Seven have failed due to the virtual machine becoming "stale" when the application thinks that it was stuck. In each of the failed tasks, it looks like the VM task was put into a paused state instead of shutting down, and then because it was in an unexpected state the application/task marked itself as invalid. And it does not seem to be related to the actual run time. Some of the successful tasks completed in about 15-16 minutes. Others took several hours to finish. The failed tasks have a pretty wide variety of run times as well. https://ralph.bakerlab.org/results.php?userid=717&offset=0&show_names=0&state=6&appid= |
Mr P Hucker Send message Joined: 3 Mar 23 Posts: 31 Credit: 9,510 RAC: 3 |
I'm seeing these take up to a day (and not finish yet), but on an old slow computer (unfortunately only that one seems to get Ralph). I guess I should let them run to the end? I was cancelling anything lasting a lot longer than the standard 4/8/12 hours we see on Rosetta. They are still using a full CPU core each. It does also say "Rule 1. Please do not abort work units." in the guidelines, although those are from 2006. |
Fardringle Send message Joined: 22 Feb 06 Posts: 18 Credit: 360,436 RAC: 1,901 |
Yes, let them try to run to completion, especially on an older computer that might take a while. I had a few take 4-6 hours on my pretty fast Ryzen 9 3900X, so a slow computer could easily take several times that long to finish. |
Mr P Hucker Send message Joined: 3 Mar 23 Posts: 31 Credit: 9,510 RAC: 3 |
Yes, let them try to run to completion, especially on an older computer that might take a while. I had a few take 4-6 hours on my pretty fast Ryzen 9 3900X, so a slow computer could easily take several times that long to finish.Some will presumably be auto-cancelled by Boinc because they won't start in time. Boinc is doing the usual trick of downloading enough work for 4 cores when only one is allowed in the preferences (25% usage). I'm sick to death of reporting schoolboy errors to Boinc programmers. |
aendgraend Send message Joined: 24 Oct 08 Posts: 5 Credit: 330,528 RAC: 0 |
My Host 37110 got several WUs one of which is running 21+ hrs right now. AFAIK one Workunit always completes at least one model, so my Question is, are those running times expected? If yes, maybe the Deadline should be adjusted, as the other WUS waiting here on my host won't have even started until the Deadline is over (so I will sadly need to cancel them which I normally won't do). The mentioned WU is still running and still on 100% progress:
As my profile doesn't show any Workunit in progress (all are shown as 'Timeout - no response') I'm going to cancel it sadly. |
Mr P Hucker Send message Joined: 3 Mar 23 Posts: 31 Credit: 9,510 RAC: 3 |
Timeout no response means you didn't send it back on time. It will still be accepted if you send it in late, it just means the server will have sent it to someone else, who probably has the same problem. So I think you should leave it running. I'm leaving mine on. |
marmot Send message Joined: 24 May 19 Posts: 1 Credit: 2,183 RAC: 0 |
Been trying to get one of these on my laptop that is attached to all my projects. 10th gen Intel successfully running my own VM's (Win 11 and Win 10 minimal services and tiny Linux install). Haven't been able to get a single python WU. One project WU's doesn't work with VBox 7+ yet so it's on VBox 6.1.42. Is this the most likely reason the server isn't sending the WU? Of course it just might be this client missed every single batch and rerun put out. I've noticed the beta servers tend to send work to clients that have been attached longest that have prior credit. This one has no credit. <scheduler_reply> <scheduler_version>707</scheduler_version> <dont_use_dcf/> <master_url>https://ralph.bakerlab.org/</master_url> <request_delay>31.000000</request_delay> <message priority="low">No tasks sent</message> <project_name>ralph@home</project_name> <symstore>https://boinc.bakerlab.org/rosetta/symstore</symstore> <user_name>marmot</user_name> <user_total_credit>2182.807721</user_total_credit> <user_expavg_credit>0.099605</user_expavg_credit> <project_preferences> <resource_share>100</resource_share> <no_cpu>0</no_cpu> <project_specific> <max_fps>20</max_fps> <max_cpu>20</max_cpu> <cpu_run_time>86400</cpu_run_time> <bg>2</bg> </project_specific> </project_preferences> <host_total_credit>0.000000</host_total_credit> <host_expavg_credit>0.000000</host_expavg_credit> <host_venue></host_venue> <host_create_time>1666233677</host_create_time> |
Message boards :
Current tests :
Test spotted
©2024 University of Washington
http://www.bakerlab.org