- 13 Jan, 2014 5 commits
-
-
solomon authored
-
Edgar Solomonik authored
-
Edgar Solomonik authored
-
-
solomon authored
Attempted to tune mapping performance model a bit, still relying on heuristic choices of flops-to-comm efficiency ratio.
-
- 12 Jan, 2014 3 commits
- 10 Jan, 2014 1 commit
-
-
solomon authored
-
- 09 Jan, 2014 1 commit
-
-
Edgar Solomonik authored
Turned off virtualization control and added a basic performance model for dgemm currently artificially in terms of communication. Improves CCSD performance by 2X on 1024 nodes of BG/Q. CCSDT still seems to run into an issue with finding a good mapping for a specific contraction on this number of nodes however.
-
- 24 Dec, 2013 2 commits
-
-
solomon authored
-
Edgar Solomonik authored
-
- 23 Dec, 2013 4 commits
-
-
solomon authored
-
solomon authored
-
solomon authored
-
solomon authored
This commit contains a number of major changes and removals of unused source which aims at slimming down the source code in preparation for a release. Switched to using CommData_t by value instead of by reference. MPI_Comms now freed via an std::set. Got rid of the deprecated subdirectory. Got rid of inner mapping functionalities which were not used due to being inefficient. (somewhat questionable decision) got rid of old unit tests contained in unit_test subdirectory, including the code needed for the VERIFY functionality used in dist_tensor_op.cxx. The old unit tests were rather limited and at this point somewhat redundant, but there may be reason to use VERIFY functionality in the future.
-
- 20 Dec, 2013 1 commit
-
-
solomon authored
-
- 19 Dec, 2013 3 commits
-
-
Edgar Solomonik authored
-
Edgar Solomonik authored
No longer scaling available memory on BGQ by 1/CTF_PPN because this is done automatically by the kernel heap memory probe. Also no longer ABORTing when mapping not found, which should make desymmetrization decision behavior more proper.
-
solomon authored
-
- 18 Dec, 2013 1 commit
-
-
- 17 Dec, 2013 4 commits
-
-
Edgar Solomonik authored
-
solomon authored
-
solomon authored
-
Justin Turney authored
-
- 14 Dec, 2013 1 commit
-
-
Justin Turney authored
-
- 13 Dec, 2013 2 commits
-
-
Edgar Solomonik authored
-
Edgar Solomonik authored
-
- 12 Dec, 2013 1 commit
-
-
solomon authored
-
- 07 Dec, 2013 1 commit
-
-
solomon authored
-
- 06 Dec, 2013 4 commits
- 05 Dec, 2013 3 commits
-
-
Edgar Solomonik authored
-
Edgar Solomonik authored
-
Edgar Solomonik authored
Made counts be allocated with mst_alloc outside omp, need to check whether this has a negative on performance still. Also slightly modified the omp for loop that moves the data in a way that does not change the logic, but apparently makes intel openmp behave properly.
-
- 03 Dec, 2013 1 commit
-
-
solomon authored
-
- 02 Dec, 2013 1 commit
-
-
solomon authored
length array was not allocated to be big enough in calc_idx, only showed up with icc, sad that gcc allowed that through.
-
- 29 Nov, 2013 1 commit
-
-
solomon authored
Better threading for zero_padding, which seems to work well. nosym_transpose seems to be the biggest problem left, but I lack a fast cxcopy implementation, so maybe itll be better on other machines.
-