Message boards : Number crunching : rosetta_beta_4.83_i686-pc-linux-gnu -> frozen
Author | Message |
---|---|
![]() ![]() Send message Joined: 16 Feb 06 Posts: 182 Credit: 22,792 RAC: 0 |
[root@crobertp root]# top 1:06am up 4 days, 6:46, 3 users, load average: 0.03, 0.05, 0.34 121 processes: 119 sleeping, 2 running, 0 zombie, 0 stopped CPU states: 0.0% user, 0.3% system, 0.0% nice, 99.6% idle Mem: 248164K av, 242340K used, 5824K free, 0K shrd, 26016K buff 180816K actv, 0K in_d, 4772K in_c, 44280K target Swap: 1020088K av, 94304K used, 925784K free 61204K cached PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM CTIME COMMAND 1968 boinc 9 0 3624 3528 1756 S 0.0 1.4 5656m ./boinc -redirectio -allow_remote_gui_rpc 27682 boinc 19 19 1240 1240 944 S N 0.0 0.4 0:00 /bin/bash ./yasuc.sh 28607 boinc 19 19 62128 43M 5116 S N 0.0 18.1 29:55 rosetta_beta_4.83_i686-pc-linux-gnu cc 1tig _ -relax -stringen 28608 boinc 19 19 62128 43M 5116 S N 0.0 18.1 0:00 rosetta_beta_4.83_i686-pc-linux-gnu cc 1tig _ -relax -stringen 28609 boinc 19 19 62128 43M 5116 S N 0.0 18.1 0:00 rosetta_beta_4.83_i686-pc-linux-gnu cc 1tig _ -relax -stringen 29298 boinc 9 0 2484 2436 2212 S 0.0 0.9 0:00 /usr/sbin/sshd 29300 boinc 9 0 2352 2348 1200 S 0.0 0.9 0:01 -bash 29864 boinc 19 19 624 624 548 S N 0.0 0.2 0:00 sleep 600 crobertp [/home/boinc/BOINC] > cat /proc/version Linux version 2.4.21-31301U90_4cl (andreas@buildmaster.distro.conectiva) (gcc version 3.2.2) #1 Qui Jun 26 01:44:43 BRT 2003 When will the ice melt ? When this WU will be done ? I believing on abort ... is that OK ? What else I can do ? 11) ----------- name: BARCODE_30_1tig__NATIVE_210_39_0 WU name: BARCODE_30_1tig__NATIVE_210_39 project URL: https://ralph.bakerlab.org/ report deadline: Fri Feb 24 17:44:01 2006 ready to report: no got server ack: no final CPU time: 0.000000 state: 2 scheduler state: 2 exit_status: 0 signal: 0 suspended via GUI: no aborted via GUI: no active_task_state: 1 stderr_out: app version num: 483 checkpoint CPU time: 1439.690000 current CPU time: 1763.480000 fraction done: 0.199956 VM usage: 0.000000 resident set size: 0.000000 estimated CPU time remaining: 9214.806238 supports graphics: no 12) ----------- Click signature for global team stats ![]() ![]() |
![]() ![]() Send message Joined: 16 Feb 06 Posts: 182 Credit: 22,792 RAC: 0 |
Well, changed this setting from yes to no Leave applications in memory while preempted? the other pcs at location "work" are running only 1 project, thus, should not be affected by this setting -:) On this pc I am running ralph & boincsimap that anyway exits everytime simap app is swaped-out --> so, should not perform more bad with no than with yes Then I refreshed ralph preferences and after killed boinc & re-started it Let's wait next couple of hours to see what will happens Click signature for global team stats ![]() ![]() |
![]() ![]() Send message Joined: 16 Feb 06 Posts: 182 Credit: 22,792 RAC: 0 |
The job crunched, but after some time erroed ! 2006-02-18 02:41:22 [boincsimap] Finished download of 200601277.032211 2006-02-18 02:41:22 [boincsimap] Throughput 28987 bytes/sec 2006-02-18 02:41:23 [---] request_reschedule_cpus: files downloaded 2006-02-18 02:41:23 [ralph@home] Restarting result BARCODE_30_1tig__NATIVE_210_39_0 using rosetta_beta version 483 2006-02-18 02:41:23 [boincsimap] Pausing result 200601277.029660_1 (removed from memory) 4 of 4 test sequences read 3026 of 3026 database sequences read 2429 of 2429 query sequences read 2006-02-18 02:41:24 [---] request_reschedule_cpus: process exited 2006-02-18 03:29:56 [ralph@home] Unrecoverable error for result BARCODE_30_1tig__NATIVE_210_39_0 (process exited with code 131 (0x83)) 2006-02-18 03:29:56 [---] request_reschedule_cpus: process exited 2006-02-18 03:29:56 [ralph@home] Computation for result BARCODE_30_1tig__NATIVE_210_39_0 finished 2006-02-18 03:29:56 [boincsimap] Restarting result 200601277.029660_1 using simap version 507 2006-02-18 03:30:56 [ralph@home] Sending scheduler request to https://ralph.bakerlab.org/ralph_cgi/cgi 2006-02-18 03:30:56 [ralph@home] Reason: To report results 2006-02-18 03:30:56 [ralph@home] Reporting 1 results 2006-02-18 03:31:16 [ralph@home] Scheduler request to https://ralph.bakerlab.org/ralph_cgi/cgi succeeded ------------------------------ Exit status 131 (0x83) Computer ID 459 Report deadline 24 Feb 2006 20:44:01 UTC CPU time 4296.17 stderr out <core_client_version>5.2.14</core_client_version> <message>process exited with code 131 (0x83) </message> <stderr_txt> # cpu_run_time_pref: 7200 # random seed: 3997032 [0x8749543] [0x8761a7c] [0x87c79c8] [0x87e23ac] [0x87e3c7d] [0x87b2947] [0x87b43e1] [0x85eba8f] [0x84237c0] [0x84242bf] [0x8424ee3] [0x8432f1d] [0x8434df2] [0x86e7593] [0x85f2184] [0x85f3808] [0x83ee90e] [0x83f130b] [0x87c0d74] [0x8048121] # cpu_run_time_pref: 7200 [0x8749543] [0x8761a7c] [0x87c79c8] [0x87422b8] [0x856a1bf] [0x845982c] [0x8441ff8] [0x8445190] [0x86e7091] [0x85f22c5] [0x85f3808] [0x83ee90e] [0x83f130b] [0x87c0d74] [0x8048121] No heartbeat from core client for 31 sec - exiting SIGSEGV: segmentation violationStack trace (15 frames): Exiting... </stderr_txt> Validate state Invalid Claimed credit 12.7700238041793 Granted credit 0 application version 4.83 Click signature for global team stats ![]() ![]() |
![]() ![]() Send message Joined: 16 Feb 06 Posts: 182 Credit: 22,792 RAC: 0 |
Other WU fozen crobertp [/home/boinc/BOINC] > ps xu USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND boinc 27682 0.0 0.4 2616 1036 ? SN Feb17 0:00 /bin/bash ./yasuc.sh boinc 30171 0.0 1.3 5892 3248 ? S 01:50 0:07 ./boinc -redirectio -allow_remote_gui_rpc -return_results_imme boinc 31123 41.3 20.9 148776 51944 ? SN 05:00 70:37 rosetta_beta_4.83_i686-pc-linux-gnu cc 1shf A -relax -stringen boinc 31124 0.0 20.9 148776 51944 ? SN 05:00 0:00 rosetta_beta_4.83_i686-pc-linux-gnu cc 1shf A -relax -stringen boinc 31125 0.0 20.9 148776 51944 ? SN 05:00 0:00 rosetta_beta_4.83_i686-pc-linux-gnu cc 1shf A -relax -stringen boinc 31679 0.0 0.8 7200 2136 ? S 07:47 0:00 /usr/sbin/sshd boinc 31680 0.1 0.9 3480 2336 pts/4 S 07:47 0:00 -bash boinc 31729 0.0 0.2 2084 624 ? SN 07:48 0:00 sleep 600 boinc 31744 0.0 0.2 2544 668 pts/4 R 07:51 0:00 ps xu crobertp [/home/boinc/BOINC] > free total used free shared buffers cached Mem: 248164 244772 3392 0 24364 67016 -/+ buffers/cache: 153392 94772 Swap: 1020088 77832 942256 crobertp [/home/boinc/BOINC] > w 7:52am up 4 days, 13:33, 2 users, load average: 0.00, 0.00, 0.00 USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT saigam pts/1 matrix.cp3 11:18pm 7:23m 0.17s 0.17s -bash boinc pts/4 200.216.141.84 7:47am 0.00s 0.27s 0.01s w crobertp [/home/boinc/BOINC] > What I should do ? This one is @ 54.99% done ! *note load average --> whole system doing nothing *may be the 1% bug has advanced to be the 54.99% bug ? Click signature for global team stats ![]() ![]() |
![]() ![]() Send message Joined: 16 Feb 06 Posts: 182 Credit: 22,792 RAC: 0 |
after 1 hour with ralph doing nothing boinc switched to next app (simap) note that now the system is using CPU crobertp [/home/boinc/BOINC] > w 8:59am up 4 days, 14:40, 2 users, load average: 1.00, 1.00, 0.93 USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT saigam pts/1 matrix.cp3 11:18pm 8:29m 0.17s 0.17s -bash boinc pts/4 200.216.141.84 7:47am 0.00s 0.30s 0.00s w crobertp [/home/boinc/BOINC] > boinc log 2006-02-18 07:02:10 [---] request_reschedule_cpus: process exited 2006-02-18 08:02:11 [ralph@home] Pausing result BARCODE_30_1shfA_NATIVE_210_41_0 (removed from memory) 2006-02-18 08:02:11 [boincsimap] Restarting result 200601277.029753_0 using simap version 507 However was *not* removed from memory !!! ps xu showed rosetta still into ram when boinc said (removed from memory) for the simap simap app does not appeared into ps xu afterwards I believe that this (removed from memory) -> but *not* removed is what is causing the 54.99% bug ! The comprobation of this thepry is into this thread on previous WU that returned to crunch again after I kill boinc and roseta was no more occuping RAM Click signature for global team stats ![]() ![]() |
![]() ![]() Send message Joined: 16 Feb 06 Posts: 182 Credit: 22,792 RAC: 0 |
I did a pkill rosetta and the results was 2006-02-18 09:34:49 [ralph@home] Result BARCODE_30_1shfA_NATIVE_210_41_0 exited with zero status but no 'finished' file 2006-02-18 09:34:49 [ralph@home] If this happens repeatedly you may need to reset the project. 2006-02-18 09:34:49 [---] request_reschedule_cpus: process exited 2006-02-18 09:34:49 [ralph@home] Restarting result BARCODE_30_1shfA_NATIVE_210_41_0 using rosetta_beta version 483 crobertp [/home/boinc/BOINC] > w 9:39am up 4 days, 15:20, 2 users, load average: 0.99, 0.61, 0.32 *and afterwards rosetta returned to crunch -> see load average *seems that rosetta must be removed from ram when switching apps If for some reason it remains into ram -> it freezes 0.0% CPU Maye be cause is I only have 256 megs ram, and the system moves inactive pages to swap. later, when returning from swap rosetta does not behave well Click signature for global team stats ![]() ![]() |
Dimitris Hatzopoulos Send message Joined: 16 Feb 06 Posts: 31 Credit: 2,308 RAC: 0 |
Could be a coincidence, but also in my case, the problems of Rosetta v4.80 process being stuck "frozen" (not consuming any CPU time) appear on a Linux (Debian Sarge kernel 2.4.27) machine with only 256MB RAM (but plenty of virtual). No other science apps had a problem on that machine (tried 6 projects), including WCG/HPF which is running an older version of Rosetta v4.21 PS: I haven't attached this (underspec'ed) machine to RALPH. |
Message boards :
Number crunching :
rosetta_beta_4.83_i686-pc-linux-gnu -> frozen
©2025 University of Washington
http://www.bakerlab.org