Rosetta 4.12+

Message boards : RALPH@home bug list : Rosetta 4.12+

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · Next

AuthorMessage
yoerik

Send message
Joined: 28 Mar 20
Posts: 9
Credit: 2,536
RAC: 0
Message 6703 - Posted: 7 Apr 2020, 18:58:36 UTC - in response to Message 6702.  

I hope they will try widely 4.15 version before release it on production in Rosetta@Home.


Posts from the admin give me hope - but they'll need more volunteers here in order to ensure that, from what I understand.
ID: 6703 · Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 9 Apr 08
Posts: 840
Credit: 1,888,960
RAC: 0
Message 6704 - Posted: 7 Apr 2020, 20:32:02 UTC - in response to Message 6703.  

Posts from the admin give me hope - but they'll need more volunteers here in order to ensure that, from what I understand.

It's not a problem. If you release work, the volunteers will arrive
ID: 6704 · Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 13 Jan 09
Posts: 100
Credit: 331,865
RAC: 0
Message 6705 - Posted: 7 Apr 2020, 22:34:54 UTC - in response to Message 6703.  

I hope they will try widely 4.15 version before release it on production in Rosetta@Home.


Posts from the admin give me hope - but they'll need more volunteers here in order to ensure that, from what I understand.

That's what Ralph@home is for - testing new versions before they are released on Rosetta@home.
ID: 6705 · Report as offensive    Reply Quote
yoerik

Send message
Joined: 28 Mar 20
Posts: 9
Credit: 2,536
RAC: 0
Message 6706 - Posted: 7 Apr 2020, 22:52:26 UTC - in response to Message 6704.  
Last modified: 7 Apr 2020, 23:22:47 UTC

Posts from the admin give me hope - but they'll need more volunteers here in order to ensure that, from what I understand.

It's not a problem. If you release work, the volunteers will arrive


From the Admin's post earlier:
4.12 was tested on Ralph but not thoroughly enough. We wanted to get it out anyway so that we can start working on the scaffolds. Time is important. We've been trying our best to get this next app version pushed out. But want it thoroughly tested now since we are still able to get important COVID-19 work done on R@h with 4.12.


I'm inferring that they wanted to test it further, but time restraints forced them to release it to the public build sooner - they didn't have enough volunteers to test them thoroughly enough here, without delaying their research.

Hence - they have time now, so there's no urgent rush at the moment. But it implies that they do need to get 4.15 out in order to do the next stage of work, but 4.12+ on the public release can do important work for now.

It's all inferred, but given that there's only 269 active users here, 502 active hosts, I sincerely doubt they have enough volunteers here.
ID: 6706 · Report as offensive    Reply Quote
nastasache

Send message
Joined: 6 Apr 20
Posts: 2
Credit: 2,754
RAC: 0
Message 6707 - Posted: 7 Apr 2020, 23:16:13 UTC - in response to Message 6706.  

Maybe they are not promoting enough the test stage. I heard about ralph almost by accident.
ID: 6707 · Report as offensive    Reply Quote
Tom Rinehart

Send message
Joined: 31 Mar 20
Posts: 4
Credit: 0
RAC: 0
Message 6708 - Posted: 8 Apr 2020, 2:33:04 UTC - in response to Message 6687.  

I went ahead and posted the OSX update on R@h. We plan to update the rest of the platforms in the next day or so.


On Rosetta@home, the Mac 4.15 app is working well. I have had 3 end in a computation error at the end of processing. I've had trouble with Rosetta Mini 3.78 app they all fail immediately like the Rosetta 4.12 Mac app.

It is giving errors like:

<core_client_version>7.14.4</core_client_version>
<![CDATA[
<message>
process exited with code 255 (0xff, -1)</message>
<stderr_txt>
[2020- 4- 7 19:41:47:] :: BOINC:: Initializing ... ok.
[2020- 4- 7 19:41:47:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully. 
command: minirosetta_3.78_x86_64-apple-darwin -abinitio::fastrelax 1 -ex2aro 1 -frag3 00001.200.3mers -in:file:native 00001.pdb -corrections::beta_nov16 -silent_gz 1 -frag9 00001.200.9mers -out:file:silent default.out -ex1 1 -abinitio::rsd_wt_loop 0.5 -relax::default_repeats 15 -abinitio::use_filters false -abinitio::increase_cycles 10 -abinitio::rsd_wt_helix 0.5 -abinitio::rg_reweight 0.5 -in:file:boinc_wu_zip CF_monomer_03_data.zip -out:file:silent default.out -silent_gz -mute all -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 600 -checkpoint_interval 120 -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -run::rng mt19937 -constant_seed -jran 2413101
Registering options.. 
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok 
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize()  End reached
ERROR: Option matching -corrections:beta_nov16 not found in command line top-level context</stderr_txt>
]]>


The errors I get on the Mac 4.15 app are mostly like this one:

<core_client_version>7.14.4</core_client_version>
<![CDATA[
<stderr_txt>
command: rosetta_4.15_x86_64-apple-darwin -run:protocol jd2_scripting -parser:protocol predictor_v11_boinc--fuse--il1r_design_boinc_v1.xml @flags_il1r2 -in:file:silent 8er4nd4m_Mini_Protein_binds_IL1R_COVID-19_design5.silent -in:file:silent_struct_type binary -silent_gz -mute all -out:file:silent_struct_type binary -out:file:silent default.out -in:file:boinc_wu_zip 8er4nd4m_Mini_Protein_binds_IL1R_COVID-19_design5.zip @8er4nd4m_Mini_Protein_binds_IL1R_COVID-19_design5.flags -nstruct 10000 -cpu_run_time 28800 -watchdog -boinc:max_nstruct 600 -checkpoint_interval 120 -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -run::rng mt19937 -constant_seed -jran 3296593
Starting watchdog...
Watchdog active.
======================================================
DONE ::   283 starting structures  29066.9 cpu seconds
This process generated    283 decoys from     283 attempts
======================================================
BOINC :: WS_max 8.82074e+08

BOINC :: Watchdog shutting down...
06:57:38 (55517): called boinc_finish(0)

</stderr_txt>
<message>
finish file present too long</message>
]]>


It looks like I also got a few of these on the Mac 4.09 app.
ID: 6708 · Report as offensive    Reply Quote
Plomos

Send message
Joined: 8 Jul 12
Posts: 4
Credit: 226
RAC: 0
Message 6709 - Posted: 8 Apr 2020, 6:22:01 UTC

So I had the same error again on two more units that I pulled only a few hours ago from the server

<core_client_version>7.16.1</core_client_version>
<![CDATA[
<stderr_txt>
command: ../../projects/ralph.bakerlab.org/rosetta_4.15_i686-pc-linux-gnu -run:protocol jd2_scripting -parser:protocol predictor_v11_boinc--fuse--covid_spike_design_boinc_v1.xml @flags_Junior_HalfRoid_vs_COVID-19_test1 -in:file:silent 6np3ll6z_Junior_HalfRoid_vs_COVID-19_test1.silent -in:file:silent_struct_type binary -silent_gz -mute all -out:file:silent_struct_type binary -out:file:silent default.out -in:file:boinc_wu_zip 6np3ll6z_Junior_HalfRoid_vs_COVID-19_test1.zip @6np3ll6z_Junior_HalfRoid_vs_COVID-19_test1.flags -nstruct 10000 -cpu_run_time 3600 -watchdog -boinc:max_nstruct 600 -checkpoint_interval 120 -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -run::rng mt19937 -constant_seed -jran 3983671
Starting watchdog...
Watchdog active.
Starting watchdog...
Watchdog active.
Starting watchdog...
Watchdog active.
BOINC:: CPU time: 18299.9s, 14400s + 3600s[2020- 4- 8  0:53:20:] :: BOINC 
WARNING! cannot get file size for default.out.gz: could not open file.
Output exists: default.out.gz Size: -1
InternalDecoyCount: 0 (GZ)
-----
0
-----
Stream information inconsistent.
Writing W_0000001
======================================================
DONE ::     1 starting structures  18299.9 cpu seconds
This process generated      1 decoys from       1 attempts
======================================================
00:53:20 (8724): called boinc_finish(0)

</stderr_txt>
]]>


It seems that this happens both here at Ralph and at main rosetta when the system sends me 32 bit tasks instead of 64bit ones. On rosetta the 64 bit tasks run as they should but the 32 bit 4.12 as well as 4.15 here that are 32 bit do not run right and only produce one decoy after hours of work. Hopefully this can be fixed
ID: 6709 · Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 9 Apr 08
Posts: 840
Credit: 1,888,960
RAC: 0
Message 6710 - Posted: 8 Apr 2020, 7:56:45 UTC - in response to Message 6706.  
Last modified: 8 Apr 2020, 8:02:33 UTC

I'm inferring that they wanted to test it further, but time restraints forced them to release it to the public build sooner - they didn't have enough volunteers to test them thoroughly enough here, without delaying their research.

I know, i've read the admin's post.
I know, also, that with 4.15 version there are not only bugifix, but also some new science ("some new code related to COVID-19 interface design that we would like to push out to R@h soon.").
So, it is important to test it.

It's all inferred, but given that there's only 269 active users here, 502 active hosts, I sincerely doubt they have enough volunteers here.

After months and months of no work and no news, volunteer has gone (try to see the registration date of first page of top users. A lot of new users. Old users got tired of waiting).
But if you give work and news, people will arrive (see, for example, the forum and the wus of Rosetta).
(Also support to Raspberry will give more platform to test).
ID: 6710 · Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 9 Apr 08
Posts: 840
Credit: 1,888,960
RAC: 0
Message 6711 - Posted: 8 Apr 2020, 7:58:09 UTC - in response to Message 6707.  

Maybe they are not promoting enough the test stage.

For sure!
The link in Home Page of Rosetta@Home to this beta project is very recent.
ID: 6711 · Report as offensive    Reply Quote
Mad_Max

Send message
Joined: 15 Nov 12
Posts: 15
Credit: 404,700
RAC: 0
Message 6712 - Posted: 8 Apr 2020, 15:51:09 UTC - in response to Message 6698.  
Last modified: 8 Apr 2020, 16:07:42 UTC

I aborted some of the older test batches. I'm not sure why your client is getting confused and running the wrong app. It should be running the 64bit version on your 64bit computer.

I was writing of getting 64bit "wrapper" app on 32 bit machines including old running under WinXP.
Of course all such WUs fails as Win32 systems can not execute any 64 bit apps. Producing error "Application is not a valid Win32 app" right at start.

I don't have any problems on 64bit windows systems currently. Latest problem was downloading failures of small files, but looks like it resolved now as i didn't saw such errors for about a week.

If older systems not longer supported by project you should adjust server scheduler accordingly, so it should not send tasks to such machines and respond with error/warning, instead of sending work to such host doomed to 100% error rate and wasting internet bandwidth and excess server load.
ID: 6712 · Report as offensive    Reply Quote
xii5ku

Send message
Joined: 8 Apr 20
Posts: 2
Credit: 23,307
RAC: 0
Message 6714 - Posted: 10 Apr 2020, 8:22:48 UTC
Last modified: 10 Apr 2020, 9:06:43 UTC

Linux i686 application version problem in v4.12 + v4.15
(100% reproducible on my Linux EMT64 hosts, problem not reproducible with Linux x86-64 application version)

On April 7 at Rosetta@home, I reported that all "Rosetta v4.12 i686-pc-linux-gnu" tasks got stuck at 1 decoy and finished after target CPU time + 4 h watchdog overtime, whereas all "Rosetta v4.12 x86_64-pc-linux-gnu" ran normally on the same hosts. (Rosetta forum thread "Rosetta v4.12 i686-pc-linux-gnu" : fixed 20 h CPU time, fixed 20 credits)

Last night I received a bunch of tasks from Ralph to 4 of the same set of computers.
I had the default target CPU time configured at Ralph, which is 1 hour.

I have 257 valid results, of 257 tasks received:

  • All 169 "Rosetta v4.15 x86_64-pc-linux-gnu tasks finished after 1 hour and generated at the order of 20...40 decoys, according to spot checks.
  • All 88 "Rosetta v4.15 i686-pc-linux-gnu tasks tasks finished after 5 = 1+4 hours and generated (3x) 9, 8, (3x) 7, 6, (3x) 5, (8x) 4, (5x) 3, (9x) 2, (55x) 1 decoys.

So there is slight progress from v4.12 to v4.15 on my hosts, but not a breakthrough yet.

i686 tasks with more than 1 decoy, and a minority of 1-decoy tasks, received varying but of course low credit.

The majority of i686 1-decoy tasks received the usual fixed 20.00 credits. These ones had the "WARNING! cannot get file size for default.out.gz: could not open file." line in their stderr, while the other tasks with more or less than 20.00 credits did not.

host 44866: received 54 i686 tasks
host 44867: received 53 x86_64 tasks, same hardware and OS as 44866
host 44869: received 34 i686 tasks, almost same hardware, same OS
host 44870: received 116 x86_64 tasks, different hardware, similar OS

hosts 44866...44869: dual Broadwell-EP, openSUSE 15.0
. . . . These hosts received both i686 and x86_64 jobs at Rosetta and at Ralph, with the described consistent results.

host 44870: dual Rome, openSUSE 15.1
. . . . This host received only x86_64 jobs at Rosetta and Ralph so far, hence only had good x86_64 results yet (no i686 jobs received).

I furthermore have a single-socket Haswell with Gentoo Linux, which received only x86_64 jobs at Rosetta but no jobs at Ralph yet.

My prior report on Rosetta v4.12 was while I ran Rosetta@home exclusively on the hosts. This report on v4.15 was while I ran TN-Grid with 0 % resource share + Ralph with 100 % resource share. But at least three of the four computers ended up with almost all threads running Ralph jobs that way, whereas the third computer worked at a mixed workload of ~1/3rd Ralph + ~2/3rds TN-Grid.

ID: 6714 · Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 13 Jan 09
Posts: 100
Credit: 331,865
RAC: 0
Message 6716 - Posted: 10 Apr 2020, 23:01:55 UTC - in response to Message 6714.  
Last modified: 10 Apr 2020, 23:10:10 UTC

I just increased the resource share for Rosetta@Home on my computer. Too soon to see the results from that yet.

My Ralph account shows no 4.15 tasks yet. Can you tell which if any of these possible causes does this?

1. They aren't testing 4.15 for Windows yet.

2. The tasks they show don't include any from the last few days, probably because the list was read from a rather obsolete copy of their database.

3. Their list of tasks for a user show them only for a day or so.

For one decoy tasks, note that the first decoy is usually only for testing how well your computer runs the software. That means that its output is seldom useful for any other purpose, and it may might not even be sent back.

I looked at TN-Grid. They are currently not accepting new users. They are thinking of starting some COVID-19 work, which would probably start a flood of new users if they don't keep limiting them.


On another subject, can't the wrapper for 32-bit tasks be recompiled or rewritten so that it runs in 32-bits, at least under 32-bit operating systems? Or maybe a script that tries the 64-bit wrapper first, and if that fails quickly with certain errors, tries the 32-bit wrapper instead? Does this need extra testing to handle a 32-bit version of BOINC running under a 64-bit operating system?
ID: 6716 · Report as offensive    Reply Quote
xii5ku

Send message
Joined: 8 Apr 20
Posts: 2
Credit: 23,307
RAC: 0
Message 6717 - Posted: 11 Apr 2020, 4:48:47 UTC - in response to Message 6716.  
Last modified: 11 Apr 2020, 4:52:53 UTC

@robertmiles, I can't respond to your Ralph@home/ Rosetta@home related points, because I am new to Ralph and lack the insight. But a quick response to this unrelated item:
robertmiles wrote:

I looked at TN-Grid. They are currently not accepting new users.
This is not correct. New users can join any time. They only need to create the account via the web site and need to enter the invitation code from the main page. AFAIK this is a measure to reduce spam, not to hinder new contributors to join. That said, it is true that their work generator always had and still has a limited pace. But my experience during the last few days was that my hosts remained saturated.


robertmiles wrote:
They are thinking of starting some COVID-19 work, which would probably start a flood of new users if they don't keep limiting them.
They already started such work. They just don't communicate this widely to boinc contributors because of the limited pace of the work generator.

/end-offtopic
ID: 6717 · Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 13 Jan 09
Posts: 100
Credit: 331,865
RAC: 0
Message 6718 - Posted: 11 Apr 2020, 22:03:21 UTC - in response to Message 6717.  

@robertmiles, I can't respond to your Ralph@home/ Rosetta@home related points, because I am new to Ralph and lack the insight. But a quick response to this unrelated item:
robertmiles wrote:

I looked at TN-Grid. They are currently not accepting new users.
This is not correct. New users can join any time. They only need to create the account via the web site and need to enter the invitation code from the main page. AFAIK this is a measure to reduce spam, not to hinder new contributors to join. That said, it is true that their work generator always had and still has a limited pace. But my experience during the last few days was that my hosts remained saturated.
/end-offtopic

[snip]
I think I was able to create an account. I'll finally try to add the project in a few hours, after I upgrade BOINC to 7.16.5. Thank you.
ID: 6718 · Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 9 Apr 08
Posts: 840
Credit: 1,888,960
RAC: 0
Message 6736 - Posted: 24 Apr 2020, 5:45:40 UTC
Last modified: 24 Apr 2020, 5:45:56 UTC

160 valid, only 3 errors

<message>
upload failure: <file_xfer_error>
<file_name>Mini_Protein_binds_IL6R_COVID-19_test3_SAVE_ALL_OUT_IGNORE_THE_REST_0cj9pv7f_32_92_0_r47019903_0</file_name>
<error_code>-240 (stat() failed)</error_code>
</file_xfer_error>
</message>
ID: 6736 · Report as offensive    Reply Quote
Rainer Baumeister

Send message
Joined: 7 Apr 20
Posts: 2
Credit: 436,943
RAC: 0
Message 6739 - Posted: 25 Apr 2020, 10:04:29 UTC - in response to Message 6736.  

Hello,

sorry, my English is very poor.


v4.15
I use a Ryzen3700x (default) with 32GB RAM: 30 tasks OK, 2 errors
A Ryzen 1700 (default) with 32GB causes massive problems: 4 OK, 66 errors!

Why? Both computers run VERY reliable in all other projects.
But with Rosetta I have to use Win10. :-(

With Mint the normal Rosetta is anyway with errors.

https://ralph.bakerlab.org/show_user.php?userid=58871

Greeting Rainer

Translated with www.DeepL.com/Translator (free version)
ID: 6739 · Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 9 Apr 08
Posts: 840
Credit: 1,888,960
RAC: 0
Message 6740 - Posted: 25 Apr 2020, 13:44:03 UTC - in response to Message 6736.  

160 valid, only 3 errors

<message>
upload failure: <file_xfer_error>
<file_name>Mini_Protein_binds_IL6R_COVID-19_test3_SAVE_ALL_OUT_IGNORE_THE_REST_0cj9pv7f_32_92_0_r47019903_0</file_name>
<error_code>-240 (stat() failed)</error_code>
</file_xfer_error>
</message>


Again 9 with this error.. (after few seconds)
ID: 6740 · Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 13 Jan 09
Posts: 100
Credit: 331,865
RAC: 0
Message 6742 - Posted: 27 Apr 2020, 19:21:54 UTC - in response to Message 6718.  

[snip]

robertmiles wrote:

I looked at TN-Grid. They are currently not accepting new users.
This is not correct. New users can join any time. They only need to create the account via the web site and need to enter the invitation code from the main page. AFAIK this is a measure to reduce spam, not to hinder new contributors to join. That said, it is true that their work generator always had and still has a limited pace. But my experience during the last few days was that my hosts remained saturated.

[snip]

I created the account, and have started running tasks.

They have finished creating all of the workunits for their planned COVID-19 work, and expect to have the rest of them downloaded soon.
ID: 6742 · Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 9 Apr 08
Posts: 840
Credit: 1,888,960
RAC: 0
Message 6749 - Posted: 28 Apr 2020, 17:02:42 UTC

Even with 4.17, after few seconds, i have these errors, like 4.15 (only two wus, however):
<message>
upload failure: <file_xfer_error>
<file_name>Mini_Protein_binds_IL6R_COVID-19_test3_SAVE_ALL_OUT_IGNORE_THE_REST_5aj5gu8j_32_397_0_r1474112397_0</file_name>
<error_code>-240 (stat() failed)</error_code>
</file_xfer_error>
</message>
ID: 6749 · Report as offensive    Reply Quote
Ivaylo Bonev

Send message
Joined: 30 Mar 20
Posts: 3
Credit: 3,702
RAC: 0
Message 6762 - Posted: 30 Apr 2020, 11:46:16 UTC - in response to Message 6749.  

Same on 4.18:
https://ralph.bakerlab.org/result.php?resultid=5034587

<message>
upload failure: <file_xfer_error>
<file_name>Mini_Protein_binds_IL6R_COVID-19_test3_SAVE_ALL_OUT_IGNORE_THE_REST_0cj9pv7f_32_749_0_r1202734223_0</file_name>
<error_code>-240 (stat() failed)</error_code>
</file_xfer_error>
</message>
ID: 6762 · Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · Next

Message boards : RALPH@home bug list : Rosetta 4.12+



©2024 University of Washington
http://www.bakerlab.org