Ralph and SSEx

Message boards : Number crunching : Ralph and SSEx

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
Profile [VENETO] boboviz

Send message
Joined: 9 Apr 08
Posts: 558
Credit: 936,557
RAC: 4,066
Message 5841 - Posted: 19 May 2015, 7:42:57 UTC

Interesting discussion about SSEx/AVX extension on rosetta code
SSE

All post-Pentium4 CPU (newer than Nov. 2000) support the SSE2 register model. Simply adding the SSE2 target option to the builds would require the machines to be made this century but would use the SSE registers. The 16 directly addressable registers would reduce register stores to the stack and code scheduling (less shuffling of data around and more computation).

A simple recompile should make a noticeable difference without any side effects. If you compile newer than SSE2 or GPUs, you have to start worrying about and managing the population of target machines you deliver workloads to.
ID: 5841 · Report as offensive    Reply Quote
Chilean
Avatar

Send message
Joined: 31 Jul 09
Posts: 11
Credit: 11,336
RAC: 0
Message 5842 - Posted: 26 May 2015, 13:59:17 UTC

I just added RALPH into my BOINC client to help out in case the admins take notice.
ID: 5842 · Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 9 Apr 08
Posts: 558
Credit: 936,557
RAC: 4,066
Message 5843 - Posted: 27 May 2015, 8:41:16 UTC - in response to Message 5842.  

I just added RALPH into my BOINC client to help out in case the admins take notice.


in remote case.... :-(
ID: 5843 · Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 9 Apr 08
Posts: 558
Credit: 936,557
RAC: 4,066
Message 5848 - Posted: 7 Jun 2015, 20:46:51 UTC

Discussion about SSE continues here
ID: 5848 · Report as offensive    Reply Quote
rjs5

Send message
Joined: 5 Jul 15
Posts: 16
Credit: 78,332
RAC: 234
Message 5852 - Posted: 5 Jul 2015, 19:25:48 UTC - in response to Message 5848.  

Ok, ok. I joined the project with a Win7 and a Fedora 21 Haswell machine. Now if David wants to feed me beta binaries for comment, he can.

Since this is a beta site, he can even send the same workload out repeatedly ... and can even hide it somewhat with different names and few will even know ... 8-)

Sending the same workload will allow him to test for the acceptable results on different machine configurations.

I really hate joining inactive projects.
ID: 5852 · Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 9 Apr 08
Posts: 558
Credit: 936,557
RAC: 4,066
Message 5860 - Posted: 26 Aug 2015, 13:23:32 UTC - in response to Message 5852.  

I really hate joining inactive projects.


+1
September is here, waiting for wus...
ID: 5860 · Report as offensive    Reply Quote
Profile dekim
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 20 Jan 06
Posts: 221
Credit: 520,632
RAC: 598
Message 5862 - Posted: 28 Aug 2015, 17:40:39 UTC

I just updated the minirosetta_beta app. I did not include SSE linux builds since it will require more testing. I did turn on SSE for the windows build. The latest linux SSE3 test was causing a significant amount of failures.

We don't have much time/resources to test these optimizations and it would be great of any of you would like to volunteer to help. As stated before, we can provide the source, build instructions, and tests. If you are interested please contact me directly at dekim AT u.washington.edu


ID: 5862 · Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 9 Apr 08
Posts: 558
Credit: 936,557
RAC: 4,066
Message 5863 - Posted: 28 Aug 2015, 18:12:00 UTC - in response to Message 5862.  

I did turn on SSE for the windows build.


Just a curiosity: which version of SSE for win? 2? 3? 4.1???
ID: 5863 · Report as offensive    Reply Quote
Chilean
Avatar

Send message
Joined: 31 Jul 09
Posts: 11
Credit: 11,336
RAC: 0
Message 5869 - Posted: 29 Aug 2015, 20:15:38 UTC - in response to Message 5862.  

I just updated the minirosetta_beta app. I did not include SSE linux builds since it will require more testing. I did turn on SSE for the windows build. The latest linux SSE3 test was causing a significant amount of failures.

We don't have much time/resources to test these optimizations and it would be great of any of you would like to volunteer to help. As stated before, we can provide the source, build instructions, and tests. If you are interested please contact me directly at dekim AT u.washington.edu




rjs5 over at this thread in R@H seemed very willing to help, don't know if you guys talked about the source code, et cetera via inbox.
ID: 5869 · Report as offensive    Reply Quote
Chilean
Avatar

Send message
Joined: 31 Jul 09
Posts: 11
Credit: 11,336
RAC: 0
Message 5870 - Posted: 29 Aug 2015, 20:18:50 UTC - in response to Message 5863.  

I did turn on SSE for the windows build.


Just a curiosity: which version of SSE for win? 2? 3? 4.1???


I think it's only SSE "1" so far. No errors except for a few WUs that failed immediately, but it doesn't appear to be a SSE-related problem.

My WUs with SSE
ID: 5870 · Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 9 Apr 08
Posts: 558
Credit: 936,557
RAC: 4,066
Message 5872 - Posted: 29 Aug 2015, 22:05:59 UTC - in response to Message 5870.  

I think it's only SSE "1" so far.


Looking at others boinc projects, i think that SSE2 is the minimum to have some improvements
ID: 5872 · Report as offensive    Reply Quote
rjs5

Send message
Joined: 5 Jul 15
Posts: 16
Credit: 78,332
RAC: 234
Message 5873 - Posted: 29 Aug 2015, 22:53:27 UTC - in response to Message 5870.  

I did turn on SSE for the windows build.


Just a curiosity: which version of SSE for win? 2? 3? 4.1???


I think it's only SSE "1" so far. No errors except for a few WUs that failed immediately, but it doesn't appear to be a SSE-related problem.

My WUs with SSE



SSE was introduced in 1999 with the Pentium-3 CPU and SSE2 was introduced in 2001 with the Pentium-4 CPU and only extended SSE. If something works under SSE, then it will work under SSE2 UNLESS it is a Pentium-3-era CPU.

The project will get more work done by sacrificing the Pentium-3 cycles (making SSE2 the minimum) and optimizing for SSE2+.

Once you get to SSE2, you will only get minor improvements, probably just a couple %, by going to the trouble of pushing the SSE/AVX envelop.

Since R@H is compiling an running in SCALAR mode which crunches only 1 64-bit value in the 128-bit dual 64-bit XMM registers, there is much more to gain by closely examining the source code and understanding what is preventing the compilers from VECTORIZING the code. If you can use BOTH 64-bit fields in the XMM registers, you get 2x performance increase. You crunch two, 4, 8, ... floating point values in the same time as 1.

This is also the reason that there is no GPU version and can NEVER be a GPU version until this is fixed .... IF the source can be changed to vectorized.


Starting from a generic, crappy 32-bit i386 version, .....
you get 80% of the scalar performance by just generating a 64-bit version.
you get the other 20% of scalar performance by messing with compiler options .... but at a high portability cost.

The next barrier after a 64-bit version should be SSE2.
The next barrier after 64-bit, SSE2 is VECTOR .... NOT .... SSE3, SSE4, AVX, ...




ID: 5873 · Report as offensive    Reply Quote
Chilean
Avatar

Send message
Joined: 31 Jul 09
Posts: 11
Credit: 11,336
RAC: 0
Message 5874 - Posted: 30 Aug 2015, 23:49:24 UTC - in response to Message 5873.  
Last modified: 30 Aug 2015, 23:49:42 UTC

I did turn on SSE for the windows build.


Just a curiosity: which version of SSE for win? 2? 3? 4.1???


I think it's only SSE "1" so far. No errors except for a few WUs that failed immediately, but it doesn't appear to be a SSE-related problem.

My WUs with SSE



SSE was introduced in 1999 with the Pentium-3 CPU and SSE2 was introduced in 2001 with the Pentium-4 CPU and only extended SSE. If something works under SSE, then it will work under SSE2 UNLESS it is a Pentium-3-era CPU.

The project will get more work done by sacrificing the Pentium-3 cycles (making SSE2 the minimum) and optimizing for SSE2+.

Once you get to SSE2, you will only get minor improvements, probably just a couple %, by going to the trouble of pushing the SSE/AVX envelop.

Since R@H is compiling an running in SCALAR mode which crunches only 1 64-bit value in the 128-bit dual 64-bit XMM registers, there is much more to gain by closely examining the source code and understanding what is preventing the compilers from VECTORIZING the code. If you can use BOTH 64-bit fields in the XMM registers, you get 2x performance increase. You crunch two, 4, 8, ... floating point values in the same time as 1.

This is also the reason that there is no GPU version and can NEVER be a GPU version until this is fixed .... IF the source can be changed to vectorized.


Starting from a generic, crappy 32-bit i386 version, .....
you get 80% of the scalar performance by just generating a 64-bit version.
you get the other 20% of scalar performance by messing with compiler options .... but at a high portability cost.

The next barrier after a 64-bit version should be SSE2.
The next barrier after 64-bit, SSE2 is VECTOR .... NOT .... SSE3, SSE4, AVX, ...






What is the gain in going native 64-bit? I would've thought that going SSE2 would bring a higher gain than 64-bit (I've always associated the 64-bit to better memory addressing, rather than increased computation speed).
ID: 5874 · Report as offensive    Reply Quote
rjs5

Send message
Joined: 5 Jul 15
Posts: 16
Credit: 78,332
RAC: 234
Message 5875 - Posted: 31 Aug 2015, 0:42:09 UTC - in response to Message 5874.  

I did turn on SSE for the windows build.


Just a curiosity: which version of SSE for win? 2? 3? 4.1???


I think it's only SSE "1" so far. No errors except for a few WUs that failed immediately, but it doesn't appear to be a SSE-related problem.

My WUs with SSE



SSE was introduced in 1999 with the Pentium-3 CPU and SSE2 was introduced in 2001 with the Pentium-4 CPU and only extended SSE. If something works under SSE, then it will work under SSE2 UNLESS it is a Pentium-3-era CPU.

The project will get more work done by sacrificing the Pentium-3 cycles (making SSE2 the minimum) and optimizing for SSE2+.

Once you get to SSE2, you will only get minor improvements, probably just a couple %, by going to the trouble of pushing the SSE/AVX envelop.

Since R@H is compiling an running in SCALAR mode which crunches only 1 64-bit value in the 128-bit dual 64-bit XMM registers, there is much more to gain by closely examining the source code and understanding what is preventing the compilers from VECTORIZING the code. If you can use BOTH 64-bit fields in the XMM registers, you get 2x performance increase. You crunch two, 4, 8, ... floating point values in the same time as 1.

This is also the reason that there is no GPU version and can NEVER be a GPU version until this is fixed .... IF the source can be changed to vectorized.


Starting from a generic, crappy 32-bit i386 version, .....
you get 80% of the scalar performance by just generating a 64-bit version.
you get the other 20% of scalar performance by messing with compiler options .... but at a high portability cost.

The next barrier after a 64-bit version should be SSE2.
The next barrier after 64-bit, SSE2 is VECTOR .... NOT .... SSE3, SSE4, AVX, ...






What is the gain in going native 64-bit? I would've thought that going SSE2 would bring a higher gain than 64-bit (I've always associated the 64-bit to better memory addressing, rather than increased computation speed).


All x86_64 have at least SSE2. My first sentence above does not make much sense since 64-bit have SSE2 registers.

64-bit has 16 registers rather than 8 registers of the 386. There is substantial reduction in temporary register spills and fills to/from the stack. When you eliminate the traffic to store/restore data to stack variables, you reduce cycles per instruction. Saving registers to a temporary stack variable requires the WRITE be pushed out to the L2 cache which is typically 5 to 10 cycles. The L1 caches are all write-through.

SSE2 and 64-bit come as a pair.



ID: 5875 · Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 9 Apr 08
Posts: 558
Credit: 936,557
RAC: 4,066
Message 5876 - Posted: 31 Aug 2015, 7:44:45 UTC - in response to Message 5874.  
Last modified: 31 Aug 2015, 7:44:59 UTC

What is the gain in going native 64-bit? I would've thought that going SSE2 would bring a higher gain than 64-bit (I've always associated the 64-bit to better memory addressing, rather than increased computation speed).


I don't understand.
The ralph/rosetta 64 bit app we are actually using is not "native"?
ID: 5876 · Report as offensive    Reply Quote
Chilean
Avatar

Send message
Joined: 31 Jul 09
Posts: 11
Credit: 11,336
RAC: 0
Message 5878 - Posted: 31 Aug 2015, 20:29:35 UTC - in response to Message 5876.  

What is the gain in going native 64-bit? I would've thought that going SSE2 would bring a higher gain than 64-bit (I've always associated the 64-bit to better memory addressing, rather than increased computation speed).


I don't understand.
The ralph/rosetta 64 bit app we are actually using is not "native"?


It seems only the Linux version is 64-bit. The Windows is still 32-bit running with a 64-bit wrapper.
ID: 5878 · Report as offensive    Reply Quote
sgaboinc

Send message
Joined: 8 Jul 14
Posts: 16
Credit: 2,855
RAC: 0
Message 5891 - Posted: 28 Sep 2015, 8:27:15 UTC - in response to Message 5878.  
Last modified: 28 Sep 2015, 8:50:13 UTC

64 bits speeds up double precision floating point maths quite a bit, i'd think this mainly applies to SIMD/SSE/AVX type of computations

http://www.roylongbottom.org.uk/linpack%20results.htm

it appear there is perhaps a 20% gain between 32 bits & 64 bits
ID: 5891 · Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 9 Apr 08
Posts: 558
Credit: 936,557
RAC: 4,066
Message 5939 - Posted: 18 Dec 2015, 17:20:18 UTC

No news?
ID: 5939 · Report as offensive    Reply Quote
rjs5

Send message
Joined: 5 Jul 15
Posts: 16
Credit: 78,332
RAC: 234
Message 5940 - Posted: 19 Dec 2015, 7:55:47 UTC - in response to Message 5939.  

No news?


Some news. I was granted a source license and I have started wading in. The documentation is dated and inaccurate (as always). I am looking to hook up with a developer to focus on the configuration they build for this project and feed back findings.

Maybe something will surface.



ID: 5940 · Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 9 Apr 08
Posts: 558
Credit: 936,557
RAC: 4,066
Message 5941 - Posted: 19 Dec 2015, 20:37:45 UTC - in response to Message 5940.  

Some news. I was granted a source license and I have started wading in. The documentation is dated and inaccurate (as always). I am looking to hook up with a developer to focus on the configuration they build for this project and feed back findings.

Maybe something will surface.


Well done!!
ID: 5941 · Report as offensive    Reply Quote
1 · 2 · 3 · Next

Message boards : Number crunching : Ralph and SSEx



©2018 University of Washington
http://www.bakerlab.org