Test spotted

Author	Message
Mr P Hucker Send message Joined: 3 Mar 23 Posts: 31 Credit: 9,510 RAC: 3	Message 7428 - Posted: 31 Oct 2023, 5:36:51 UTC Pity this place is full of spam. Here's a real post. I got 1 task on each of 2 machines. 5GB vdi image downloaded for Python. 1st machine said download failure, not sure why. Computer may have crashed of it's own accord, it has a dodgy old GPU. https://ralph.bakerlab.org/results.php?hostid=49339 2nd machine got it ok, but caused a computation error immediately. https://ralph.bakerlab.org/results.php?hostid=48821 ID: 7428 · Reply Quote

aendgraend Send message Joined: 24 Oct 08 Posts: 5 Credit: 330,528 RAC: 0	Message 7432 - Posted: 10 Nov 2023, 10:18:04 UTC - in response to Message 7428. Last modified: 10 Nov 2023, 10:20:50 UTC My Host 37110 got several WUs one of which is running 21+ hrs right now. AFAIK one Workunit always completes at least one model, so my Question is, are those running times expected? If yes, maybe the Deadline should be adjusted, as the other WUS waiting here on my host won't have even started until the Deadline is over (so I will sadly need to cancel them which I normally won't do). Here's the Details of the current WU (which is already on 100% progress since yesterday evening) Application => rosetta python projects 0.21 (vbox64) Name => Diff_SAVE_ALL_OUT_NOJRAN_IGNORE_THE_REST_validation_2_5_16878_11 Status => Active received => 08.11.2023 13:45:29 Expiration date => 11/11/2023 13:45:15 Estimated calculation effort => 80,000 GFLOPs Processor time => 1d 05:24:20 Processor time since last checkpoint => 00:07:08 Runtime to date => 21:23:25 Estimated time remaining => 00:00:00 Progress => 100.000% Memory required => 62.15 MB Size of the work package => 3.26 GB Directory => slots/21 Process no. => 10440 Progress rate => 4.680% per hour Executable file => vboxwrapper_26203_windows_x86_64.exe ID: 7432 · Reply Quote

Fardringle Send message Joined: 22 Feb 06 Posts: 18 Credit: 360,436 RAC: 1,901	Message 7433 - Posted: 10 Nov 2023, 14:25:52 UTC Last modified: 10 Nov 2023, 14:33:31 UTC I received a total of 12 tasks from this test batch. Five completed successfully. Seven have failed due to the virtual machine becoming "stale" when the application thinks that it was stuck. In each of the failed tasks, it looks like the VM task was put into a paused state instead of shutting down, and then because it was in an unexpected state the application/task marked itself as invalid. And it does not seem to be related to the actual run time. Some of the successful tasks completed in about 15-16 minutes. Others took several hours to finish. The failed tasks have a pretty wide variety of run times as well. https://ralph.bakerlab.org/results.php?userid=717&offset=0&show_names=0&state=6&appid= ID: 7433 · Reply Quote

Mr P Hucker Send message Joined: 3 Mar 23 Posts: 31 Credit: 9,510 RAC: 3	Message 7434 - Posted: 11 Nov 2023, 14:46:09 UTC Last modified: 11 Nov 2023, 14:49:48 UTC I'm seeing these take up to a day (and not finish yet), but on an old slow computer (unfortunately only that one seems to get Ralph). I guess I should let them run to the end? I was cancelling anything lasting a lot longer than the standard 4/8/12 hours we see on Rosetta. They are still using a full CPU core each. It does also say "Rule 1. Please do not abort work units." in the guidelines, although those are from 2006. ID: 7434 · Reply Quote

Fardringle Send message Joined: 22 Feb 06 Posts: 18 Credit: 360,436 RAC: 1,901	Message 7435 - Posted: 12 Nov 2023, 2:02:26 UTC - in response to Message 7434. Yes, let them try to run to completion, especially on an older computer that might take a while. I had a few take 4-6 hours on my pretty fast Ryzen 9 3900X, so a slow computer could easily take several times that long to finish. ID: 7435 · Reply Quote

Mr P Hucker Send message Joined: 3 Mar 23 Posts: 31 Credit: 9,510 RAC: 3	Message 7436 - Posted: 12 Nov 2023, 2:27:40 UTC - in response to Message 7435. Yes, let them try to run to completion, especially on an older computer that might take a while. I had a few take 4-6 hours on my pretty fast Ryzen 9 3900X, so a slow computer could easily take several times that long to finish. Some will presumably be auto-cancelled by Boinc because they won't start in time. Boinc is doing the usual trick of downloading enough work for 4 cores when only one is allowed in the preferences (25% usage). I'm sick to death of reporting schoolboy errors to Boinc programmers. ID: 7436 · Reply Quote

aendgraend Send message Joined: 24 Oct 08 Posts: 5 Credit: 330,528 RAC: 0	Message 7437 - Posted: 13 Nov 2023, 11:24:09 UTC - in response to Message 7432. Last modified: 13 Nov 2023, 11:24:56 UTC My Host 37110 got several WUs one of which is running 21+ hrs right now. AFAIK one Workunit always completes at least one model, so my Question is, are those running times expected? If yes, maybe the Deadline should be adjusted, as the other WUS waiting here on my host won't have even started until the Deadline is over (so I will sadly need to cancel them which I normally won't do). Here's the Details of the current WU (which is already on 100% progress since yesterday evening) Application => rosetta python projects 0.21 (vbox64) Name => Diff_SAVE_ALL_OUT_NOJRAN_IGNORE_THE_REST_validation_2_5_16878_11 Status => Active received => 08.11.2023 13:45:29 Expiration date => 11/11/2023 13:45:15 Estimated calculation effort => 80,000 GFLOPs Processor time => 1d 05:24:20 Processor time since last checkpoint => 00:07:08 Runtime to date => 21:23:25 Estimated time remaining => 00:00:00 Progress => 100.000% Memory required => 62.15 MB Size of the work package => 3.26 GB Directory => slots/21 Process no. => 10440 Progress rate => 4.680% per hour Executable file => vboxwrapper_26203_windows_x86_64.exe The mentioned WU is still running and still on 100% progress: application => rosetta python projects 0.21 (vbox64) name Diff_SAVE_ALL_OUT_NOJRAN_IGNORE_THE_REST_validation_2_5_16878_11 Status => Active Received => 08.11.2023 13:45:29 Expiration date => 11.11.2023 13:45:15 Estimated calculation effort => 80,000 GFLOPs Processor time => 3d 17:10:49 Processor time since last checkpoint => 00:06:32 Runtime to date => 2d 21:15:29 Estimated time remaining => --- Progress => 100,000% Memory required => 63,26 MB Size of the work package => 3.26 GB directory => slots/21 Process no. => 4464 Progress rate => 1.440% per hour Executable file => vboxwrapper_26203_windows_x86_64.exe As my profile doesn't show any Workunit in progress (all are shown as 'Timeout - no response') I'm going to cancel it sadly. ID: 7437 · Reply Quote

Mr P Hucker Send message Joined: 3 Mar 23 Posts: 31 Credit: 9,510 RAC: 3	Message 7438 - Posted: 13 Nov 2023, 20:19:35 UTC - in response to Message 7437. Timeout no response means you didn't send it back on time. It will still be accepted if you send it in late, it just means the server will have sent it to someone else, who probably has the same problem. So I think you should leave it running. I'm leaving mine on. ID: 7438 · Reply Quote

marmot Send message Joined: 24 May 19 Posts: 1 Credit: 2,183 RAC: 0	Message 7450 - Posted: 5 Dec 2023, 19:21:21 UTC Last modified: 5 Dec 2023, 19:23:38 UTC Been trying to get one of these on my laptop that is attached to all my projects. 10th gen Intel successfully running my own VM's (Win 11 and Win 10 minimal services and tiny Linux install). Haven't been able to get a single python WU. One project WU's doesn't work with VBox 7+ yet so it's on VBox 6.1.42. Is this the most likely reason the server isn't sending the WU? Of course it just might be this client missed every single batch and rerun put out. I've noticed the beta servers tend to send work to clients that have been attached longest that have prior credit. This one has no credit. <scheduler_reply> <scheduler_version>707</scheduler_version> <dont_use_dcf/> <master_url>https://ralph.bakerlab.org/</master_url> <request_delay>31.000000</request_delay> <message priority="low">No tasks sent</message> <project_name>ralph@home</project_name> <symstore>https://boinc.bakerlab.org/rosetta/symstore</symstore> <user_name>marmot</user_name> <user_total_credit>2182.807721</user_total_credit> <user_expavg_credit>0.099605</user_expavg_credit> <project_preferences> <resource_share>100</resource_share> <no_cpu>0</no_cpu> <project_specific> <max_fps>20</max_fps> <max_cpu>20</max_cpu> <cpu_run_time>86400</cpu_run_time> <bg>2</bg> </project_specific> </project_preferences> <host_total_credit>0.000000</host_total_credit> <host_expavg_credit>0.000000</host_expavg_credit> <host_venue></host_venue> <host_create_time>1666233677</host_create_time> ID: 7450 · Reply Quote