Message boards : Number crunching : Ralph and SSEx
Author | Message |
---|---|
[VENETO] boboviz Send message Joined: 9 Apr 08 Posts: 904 Credit: 1,889,390 RAC: 0 |
Interesting discussion about SSEx/AVX extension on rosetta code SSE All post-Pentium4 CPU (newer than Nov. 2000) support the SSE2 register model. Simply adding the SSE2 target option to the builds would require the machines to be made this century but would use the SSE registers. The 16 directly addressable registers would reduce register stores to the stack and code scheduling (less shuffling of data around and more computation). |
Chilean Send message Joined: 31 Jul 09 Posts: 12 Credit: 38,068 RAC: 0 |
I just added RALPH into my BOINC client to help out in case the admins take notice. |
[VENETO] boboviz Send message Joined: 9 Apr 08 Posts: 904 Credit: 1,889,390 RAC: 0 |
I just added RALPH into my BOINC client to help out in case the admins take notice. in remote case.... :-( |
[VENETO] boboviz Send message Joined: 9 Apr 08 Posts: 904 Credit: 1,889,390 RAC: 0 |
Discussion about SSE continues here |
rjs5 Send message Joined: 5 Jul 15 Posts: 22 Credit: 135,602 RAC: 3,244 |
Ok, ok. I joined the project with a Win7 and a Fedora 21 Haswell machine. Now if David wants to feed me beta binaries for comment, he can. Since this is a beta site, he can even send the same workload out repeatedly ... and can even hide it somewhat with different names and few will even know ... 8-) Sending the same workload will allow him to test for the acceptable results on different machine configurations. I really hate joining inactive projects. |
[VENETO] boboviz Send message Joined: 9 Apr 08 Posts: 904 Credit: 1,889,390 RAC: 0 |
I really hate joining inactive projects. +1 September is here, waiting for wus... |
dekim Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 20 Jan 06 Posts: 250 Credit: 543,579 RAC: 0 |
I just updated the minirosetta_beta app. I did not include SSE linux builds since it will require more testing. I did turn on SSE for the windows build. The latest linux SSE3 test was causing a significant amount of failures. We don't have much time/resources to test these optimizations and it would be great of any of you would like to volunteer to help. As stated before, we can provide the source, build instructions, and tests. If you are interested please contact me directly at dekim AT u.washington.edu |
[VENETO] boboviz Send message Joined: 9 Apr 08 Posts: 904 Credit: 1,889,390 RAC: 0 |
I did turn on SSE for the windows build. Just a curiosity: which version of SSE for win? 2? 3? 4.1??? |
Chilean Send message Joined: 31 Jul 09 Posts: 12 Credit: 38,068 RAC: 0 |
I just updated the minirosetta_beta app. I did not include SSE linux builds since it will require more testing. I did turn on SSE for the windows build. The latest linux SSE3 test was causing a significant amount of failures. rjs5 over at this thread in R@H seemed very willing to help, don't know if you guys talked about the source code, et cetera via inbox. |
Chilean Send message Joined: 31 Jul 09 Posts: 12 Credit: 38,068 RAC: 0 |
I did turn on SSE for the windows build. I think it's only SSE "1" so far. No errors except for a few WUs that failed immediately, but it doesn't appear to be a SSE-related problem. My WUs with SSE |
[VENETO] boboviz Send message Joined: 9 Apr 08 Posts: 904 Credit: 1,889,390 RAC: 0 |
I think it's only SSE "1" so far. Looking at others boinc projects, i think that SSE2 is the minimum to have some improvements |
rjs5 Send message Joined: 5 Jul 15 Posts: 22 Credit: 135,602 RAC: 3,244 |
I did turn on SSE for the windows build. SSE was introduced in 1999 with the Pentium-3 CPU and SSE2 was introduced in 2001 with the Pentium-4 CPU and only extended SSE. If something works under SSE, then it will work under SSE2 UNLESS it is a Pentium-3-era CPU. The project will get more work done by sacrificing the Pentium-3 cycles (making SSE2 the minimum) and optimizing for SSE2+. Once you get to SSE2, you will only get minor improvements, probably just a couple %, by going to the trouble of pushing the SSE/AVX envelop. Since R@H is compiling an running in SCALAR mode which crunches only 1 64-bit value in the 128-bit dual 64-bit XMM registers, there is much more to gain by closely examining the source code and understanding what is preventing the compilers from VECTORIZING the code. If you can use BOTH 64-bit fields in the XMM registers, you get 2x performance increase. You crunch two, 4, 8, ... floating point values in the same time as 1. This is also the reason that there is no GPU version and can NEVER be a GPU version until this is fixed .... IF the source can be changed to vectorized. Starting from a generic, crappy 32-bit i386 version, ..... you get 80% of the scalar performance by just generating a 64-bit version. you get the other 20% of scalar performance by messing with compiler options .... but at a high portability cost. The next barrier after a 64-bit version should be SSE2. The next barrier after 64-bit, SSE2 is VECTOR .... NOT .... SSE3, SSE4, AVX, ... |
Chilean Send message Joined: 31 Jul 09 Posts: 12 Credit: 38,068 RAC: 0 |
I did turn on SSE for the windows build. What is the gain in going native 64-bit? I would've thought that going SSE2 would bring a higher gain than 64-bit (I've always associated the 64-bit to better memory addressing, rather than increased computation speed). |
rjs5 Send message Joined: 5 Jul 15 Posts: 22 Credit: 135,602 RAC: 3,244 |
I did turn on SSE for the windows build. All x86_64 have at least SSE2. My first sentence above does not make much sense since 64-bit have SSE2 registers. 64-bit has 16 registers rather than 8 registers of the 386. There is substantial reduction in temporary register spills and fills to/from the stack. When you eliminate the traffic to store/restore data to stack variables, you reduce cycles per instruction. Saving registers to a temporary stack variable requires the WRITE be pushed out to the L2 cache which is typically 5 to 10 cycles. The L1 caches are all write-through. SSE2 and 64-bit come as a pair. |
[VENETO] boboviz Send message Joined: 9 Apr 08 Posts: 904 Credit: 1,889,390 RAC: 0 |
What is the gain in going native 64-bit? I would've thought that going SSE2 would bring a higher gain than 64-bit (I've always associated the 64-bit to better memory addressing, rather than increased computation speed). I don't understand. The ralph/rosetta 64 bit app we are actually using is not "native"? |
Chilean Send message Joined: 31 Jul 09 Posts: 12 Credit: 38,068 RAC: 0 |
What is the gain in going native 64-bit? I would've thought that going SSE2 would bring a higher gain than 64-bit (I've always associated the 64-bit to better memory addressing, rather than increased computation speed). It seems only the Linux version is 64-bit. The Windows is still 32-bit running with a 64-bit wrapper. |
sgaboinc Send message Joined: 8 Jul 14 Posts: 20 Credit: 4,159 RAC: 0 |
64 bits speeds up double precision floating point maths quite a bit, i'd think this mainly applies to SIMD/SSE/AVX type of computations http://www.roylongbottom.org.uk/linpack%20results.htm it appear there is perhaps a 20% gain between 32 bits & 64 bits |
[VENETO] boboviz Send message Joined: 9 Apr 08 Posts: 904 Credit: 1,889,390 RAC: 0 |
No news? |
rjs5 Send message Joined: 5 Jul 15 Posts: 22 Credit: 135,602 RAC: 3,244 |
No news? Some news. I was granted a source license and I have started wading in. The documentation is dated and inaccurate (as always). I am looking to hook up with a developer to focus on the configuration they build for this project and feed back findings. Maybe something will surface. |
[VENETO] boboviz Send message Joined: 9 Apr 08 Posts: 904 Credit: 1,889,390 RAC: 0 |
Some news. I was granted a source license and I have started wading in. The documentation is dated and inaccurate (as always). I am looking to hook up with a developer to focus on the configuration they build for this project and feed back findings. Well done!! |
Message boards :
Number crunching :
Ralph and SSEx
©2024 University of Washington
http://www.bakerlab.org