v1.31 Version 1.31 New features/updates: 1. Renewed support for CUDA-based offloading 2. New macro -DTUNE, activates model tuning, which adjusts performance model parameters. A benchmark aimed to execute a characteristic training set is included in bench/model_trainer and can be used to tune models for any architecture. Output at the end of the benchmark should be pasted into src/shared/init_models.cxx. It is not advisable to always run with -DTUNE on. Current parameters are based on a 16 node Edison runs with 4 processes per node and should be reasonable in most settings. Cubic polynomial model also available, but not used due to inferior observed modelling quality. 3. Added a mapping search that exhaustively tries all possible mappings. As no effective benefit was observed from this expanded search space and the search itself slowed time down for low-cost contractions (e.g. in CCSDT), this mapping search is only done when the time estimated by the old scheme is longer than 1 second. 4. Fixed up configure file and generated config.mk file to be more effective and better documented.
v1.3.0 Merge branch 'sparsity'
v1.2.3 label update v1.22 -> v1.23
v1.22 Latest stable version, with performance improvements. Slight bug fixes and changes to interface made and as a result tag was moved up from previous commit, performance should not change.
v1.2.2 Latest stable version, with performance improvements. Slight bug fixes and changes to interface made and as a result tag was moved up from previous commit, performance should not change.
v1.2.1 1. Added capability (via constructor) to repack tensor data into a different packed symmetric layout without doing the normal symmetrization permutations that happen during sum 2. Fixed bug associated with C[ij]=A[ij]*B[ijkl] where C and A are SY and B is fully NS. 3. Getting (2) to work in parallel required resolving a bug associated with desymmetrization (in the two unfold_broken_Sym functions), which were changing symmetry without changing sym_table, and as a result yielding extra mapping restrictions. The bug fix to (3) might improve performance generally.
v1.2.0 Correct set() to scalar function for tensors, added appropriate test to scalar.cxx, and also fixed a new bug in tensor print()
v1.1.0 Version 1.1: GPU support added, non-redistributed mappings explicitly considered, performance models improved, miscellanous bugs fixed
v1.1 Version 1.1: GPU support added, non-redistributed mappings explicitly considered, performance models improved, miscellanous bugs fixed
v1.0.0 First release of Cyclops Tensor Framework (CTF). Tensors up to 6D (CCSDT) tested on icc, gcc, and xlc. CTF instantiated for double and complex<double>.
v1.0 First release of Cyclops Tensor Framework (CTF). Tensors up to 6D (CCSDT) tested on icc, gcc, and xlc. CTF instantiated for double and complex<double>.