-
v1.31 Version 1.31 New features/updates: 1. Renewed support for CUDA-based offloading 2. New macro -DTUNE, activates model tuning, which adjusts performance model parameters. A benchmark aimed to execute a characteristic training set is included in bench/model_trainer and can be used to tune models for any architecture. Output at the end of the benchmark should be pasted into src/shared/init_models.cxx. It is not advisable to always run with -DTUNE on. Current parameters are based on a 16 node Edison runs with 4 processes per node and should be reasonable in most settings. Cubic polynomial model also available, but not used due to inferior observed modelling quality. 3. Added a mapping search that exhaustively tries all possible mappings. As no effective benefit was observed from this expanded search space and the search itself slowed time down for low-cost contractions (e.g. in CCSDT), this mapping search is only done when the time estimated by the old scheme is longer than 1 second. 4. Fixed up configure file and generated config.mk file to be more effective and better documented.
-
v1.3.0 Merge branch 'sparsity'
-
v1.2.3 label update v1.22 -> v1.23
-
v1.22 Latest stable version, with performance improvements. Slight bug fixes and changes to interface made and as a result tag was moved up from previous commit, performance should not change.
-
v1.2.2 Latest stable version, with performance improvements. Slight bug fixes and changes to interface made and as a result tag was moved up from previous commit, performance should not change.
-
v1.2.1 1. Added capability (via constructor) to repack tensor data into a different packed symmetric layout without doing the normal symmetrization permutations that happen during sum 2. Fixed bug associated with C[ij]=A[ij]*B[ijkl] where C and A are SY and B is fully NS. 3. Getting (2) to work in parallel required resolving a bug associated with desymmetrization (in the two unfold_broken_Sym functions), which were changing symmetry without changing sym_table, and as a result yielding extra mapping restrictions. The bug fix to (3) might improve performance generally.
-
v1.2.0 Correct set() to scalar function for tensors, added appropriate test to scalar.cxx, and also fixed a new bug in tensor print()
-
v1.1.0 Version 1.1: GPU support added, non-redistributed mappings explicitly considered, performance models improved, miscellanous bugs fixed
-
v1.1 Version 1.1: GPU support added, non-redistributed mappings explicitly considered, performance models improved, miscellanous bugs fixed
-
v1.0.0 First release of Cyclops Tensor Framework (CTF). Tensors up to 6D (CCSDT) tested on icc, gcc, and xlc. CTF instantiated for double and complex<double>.
-
v1.0 First release of Cyclops Tensor Framework (CTF). Tensors up to 6D (CCSDT) tested on icc, gcc, and xlc. CTF instantiated for double and complex<double>.