Ralph and SSEx

Message boards : Number crunching : Ralph and SSEx

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · Next

AuthorMessage
Profile [VENETO] boboviz

Send message
Joined: 9 Apr 08
Posts: 913
Credit: 1,892,541
RAC: 294
Message 6031 - Posted: 30 Jan 2016, 15:08:58 UTC - in response to Message 6028.  

the commands are run in linux, but i'd guess u may have figured out what this means :D


You're right!
For first time I've analyzed the x86_64.exe with MS ProcessExplorer tool.
And it's a 32 bit native image.
ID: 6031 · Report as offensive    Reply Quote
rjs5

Send message
Joined: 5 Jul 15
Posts: 22
Credit: 135,787
RAC: 2,494
Message 6032 - Posted: 31 Jan 2016, 20:28:58 UTC - in response to Message 6029.  

on this topic, it may be also good to mention that modern compilers are sophisticated. even recent versions of open sourced compilers such as gcc and llvm has pretty advanced/sophisticated *auto-vectorization* features

https://gcc.gnu.org/projects/tree-ssa/vectorization.html
http://llvm.org/devmtg/2012-04-12/Slides/Hal_Finkel.pdf

while that may not produce the most tuned codes, it is probably an incorrect notion that r@h don't have SSEn/AVXn optimizations, the compiler may have embedded some of such SSEn/AVXn optimizations.

this may somewhat explain the somewhat higher performance of r@h in 64bits linux vs say 64 bits windows in the statistics. This is because the combination of optimised 64 bits binaries running in 64 bits linux would most likely have (possibly significantly) better performance compared to 32 bits (possibly less optimised) binaries running in 64 bits windows

i.e. windows platform may see (significant) performance gains just compiling and releasing 64 bit binaries targeting 64 bits windows platforms with a modern / recent sophisticated compiler


It depends on you mean by "significant" gains. I would guess that gains would be 10% to 20% over the current 32-bit binary.


Here is a link that how sensitive auto-vectorization is to the source code layout.
http://locklessinc.com/articles/vectorize/

Rosetta code does not have any AVX code but does have SSE scalar code. I think even in 32-bit Windows there is SSE code. All the applications still have 387 code. It will take some time and source code changes to generate any vector code.


For those who have time, you can install a Linux Guest environment on your Windows machine and test out the 32-bit Windows, 32-bit Linux and 64-bit Linux performance on the same hardware.


One of many ways would be to install Virtualbox:
https://www.virtualbox.org/wiki/Downloads

Download a prebuild Linux image that you are interested in.
http://www.osboxes.org/virtualbox-images/

Install BOINC package on that guest.

Run Rosetta application.



ID: 6032 · Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 9 Apr 08
Posts: 913
Credit: 1,892,541
RAC: 294
Message 6053 - Posted: 4 Mar 2016, 16:31:55 UTC

GCC supports HSA

Heterogeneous Systems Architecture support [2016-01-27]
Heterogeneous Systems Architecture 1.0 support was added to GCC, contributed by Martin Jambor, Martin Liška and Michael Matz from SUSE.


https://gcc.gnu.org/gcc-6/changes.html#hsa
ID: 6053 · Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 9 Apr 08
Posts: 913
Credit: 1,892,541
RAC: 294
Message 6186 - Posted: 9 May 2017, 20:33:14 UTC

Waiting for Godot....
ID: 6186 · Report as offensive    Reply Quote
Dr. Merkwürdigliebe

Send message
Joined: 12 Jun 15
Posts: 16
Credit: 23,473
RAC: 0
Message 6187 - Posted: 9 May 2017, 20:40:29 UTC

There will be no such changes to the code base. Posts concerning OpenCL or HSA are pointless.
ID: 6187 · Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 9 Apr 08
Posts: 913
Credit: 1,892,541
RAC: 294
Message 6188 - Posted: 10 May 2017, 6:46:30 UTC - in response to Message 6187.  
Last modified: 10 May 2017, 6:47:34 UTC

There will be no such changes to the code base. Posts concerning OpenCL or HSA are pointless.


This thread is not about OpenCl (i posted HSA only for info), but about cpu optimizations, that are feasible. I know that gpu in this project is simply an utopia :-P
ID: 6188 · Report as offensive    Reply Quote
Dr. Merkwürdigliebe

Send message
Joined: 12 Jun 15
Posts: 16
Credit: 23,473
RAC: 0
Message 6189 - Posted: 10 May 2017, 14:12:29 UTC - in response to Message 6188.  

There will be no CPU optimizations either...there is either a lack of knowledge or a lack of interest, probably the former.
ID: 6189 · Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 9 Apr 08
Posts: 913
Credit: 1,892,541
RAC: 294
Message 6190 - Posted: 11 May 2017, 12:47:56 UTC - in response to Message 6189.  

there is either a lack of knowledge or a lack of interest, probably the former.


The second one, for me...
ID: 6190 · Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 9 Apr 08
Posts: 913
Credit: 1,892,541
RAC: 294
Message 6192 - Posted: 10 Jun 2017, 10:32:32 UTC

There is a volunteer (quantumethos) that posts some link to optimization to help developers. Here some examples:
Agner optimize
PGI Compiler
Boinc optimize
ID: 6192 · Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 9 Apr 08
Posts: 913
Credit: 1,892,541
RAC: 294
Message 6246 - Posted: 26 Nov 2017, 18:41:15 UTC - in response to Message 5862.  

I just updated the minirosetta_beta app. I did not include SSE linux builds since it will require more testing. I did turn on SSE for the windows build. The latest linux SSE3 test was causing a significant amount of failures.


After over 2 years, have you try again these optimizations?
SSEx for Windows were good, i think, so why not use??
ID: 6246 · Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 9 Apr 08
Posts: 913
Credit: 1,892,541
RAC: 294
Message 6551 - Posted: 7 May 2018, 10:39:50 UTC

GCC 8.1 ready to be public
AOCC 1.2 (Amd Optimizing Compiler for C/C++)
C++17 is ISO standard.
ID: 6551 · Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 9 Apr 08
Posts: 913
Credit: 1,892,541
RAC: 294
Message 6577 - Posted: 6 Jul 2018, 13:00:32 UTC

Interesting slides from DHPCC++ 2018
Visual Studio Code is at 1.25 version
ID: 6577 · Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 9 Apr 08
Posts: 913
Credit: 1,892,541
RAC: 294
Message 6593 - Posted: 7 Feb 2019, 17:03:54 UTC

Few days ago, a volunteer optimized the code of Acustica@Home, with SSE/AVX extensions.
Results are great
Acustics
ID: 6593 · Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 9 Apr 08
Posts: 913
Credit: 1,892,541
RAC: 294
Message 6597 - Posted: 3 Apr 2019, 8:20:25 UTC

Visual Studio 2019 released
C++20 is now features complete (specifications will be ready for July).
GCC 9 ready to be released, next weeks
ID: 6597 · Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 9 Apr 08
Posts: 913
Credit: 1,892,541
RAC: 294
Message 6601 - Posted: 3 May 2019, 11:55:42 UTC - in response to Message 6597.  

GCC 9 ready to be released, next weeks


Reased GCC9.1, the first stable release of GCC 9.x
Some features:
- The D programming language front-end has finally been mainlined in GCC! There is now D support beginning with GCC 9.

- Initial support for the Arm Neoverse N1 processors along with other existing AArch64 Cortex processors.

- Initial support for Intel Cascadelake server processors with AVX-512 VNNI (DL BOOST) via the -march=cascadelake flag.

- Initial support for OpenMP 5.0.

- Nearly complete support for the OpenACC 2.5 specification.

- Experimental support for C++2A is exposed via the "-std=c++2a" switch for this next revision of C++ likely to be called C++20. There is also work done on the C++ standard library side (libstdc++) and other bits while for C++17 is the initial parallel algorithms implementation.

- Along similar lines, there is also experimental support for C2X as the next C language revision and that is exposed via the -std=c2x switch.

- Fortran support in GCC has also been improved with now handling asynchronous I/O and other features.

- Inter-procedural optimizations (IPO), profile-driven optimizations, link-time optimizations (LTO), and a variety of other optimizations in trying to ensure better generated code.
ID: 6601 · Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 9 Apr 08
Posts: 913
Credit: 1,892,541
RAC: 294
Message 6604 - Posted: 3 Jun 2019, 9:21:40 UTC - in response to Message 6597.  

ID: 6604 · Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 9 Apr 08
Posts: 913
Credit: 1,892,541
RAC: 294
Message 6621 - Posted: 17 Nov 2019, 8:58:41 UTC
Last modified: 17 Nov 2019, 8:59:11 UTC

ID: 6621 · Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 9 Apr 08
Posts: 913
Credit: 1,892,541
RAC: 294
Message 6626 - Posted: 13 Jan 2020, 13:43:17 UTC

ID: 6626 · Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 9 Apr 08
Posts: 913
Credit: 1,892,541
RAC: 294
Message 6629 - Posted: 14 Feb 2020, 13:33:12 UTC - in response to Message 6577.  

Visual Studio Code is at 1.25 version

Now it's 1.42.1
ID: 6629 · Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 9 Apr 08
Posts: 913
Credit: 1,892,541
RAC: 294
Message 6721 - Posted: 18 Apr 2020, 17:11:21 UTC

ID: 6721 · Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · 4 · Next

Message boards : Number crunching : Ralph and SSEx



©2024 University of Washington
http://www.bakerlab.org