17 753 889 197 660 635 632 updated on 2024.02.17 and still to be confirmed. (17 753 889 189 701 385 264 updated on 2023.09.07.) (17 753 889 189 701 384 304 reported on 2023.07.18.)
This result is consistent with stochastic estimates 1.7745(16)·1019[1], 1.775392(12)·1019[2] and 1.77543(73)·1019[3].
Using hundreds of GPUs at cloud resource rental services, it took about six months to complete the counting, Though the number and models of GPUs used varied over time, the total computation time amounts to about 80,000 hours of GeForce RTX-4090.
Because of the extraordinary volume of the calculation, it is not easy to deny a possibility that the result is contaminated by accidental errors. I am currently performing a thorough double-check and it is 70% completed. I would appreciate confirmation or disputes by others.
In the enumeration, more than five hundreds of instances were created, and some of them were unfortunately faulty and produced wrong results. The followings are the cases I discovered so far by recounting.
Another erroneous instance was found. It ran with an RTX-4090 for about one month and produced about 19,000 sub-subtotals. Out of those sub-subtotals, 6 were incorrect. All of the incorrect results were produced in the last one hour of the lifetime of the instance. After the erroneous behavior, the GPU of the instance became unusable with an error message of “invalid memory access”.
As the result of the correction, the number increased by 7 959 250 368 (331 635 432 x24).
During the thorough double-check, it was discovered that a portion of the results generated by an instance was incorrect. The instance ran with two RTX-4090s for 60 hours and generated 3,771 sub-subtotals. Out of those sub-subtotals, only 12 were incorrect and all incorrect results were generated by only one of the two RTX-4090s. It is unlikely that these errors are due to logical flaws or coding mistakes. Hardware defects or instability are the most probable causes.
As the result of the correction, the number increased by 960(40×24).
While these errors have not damaged my confidence in the logic and the code used in the calculation, it is possible that errors of similar nature may still be contained in the result. Therefore, the results should be considered unconfirmed until the thorough double-check is completed.
The code used in the initial enumeration was discovered to contain a mistake related to GPU thread synchronization. A corrected version of the code is currently running for the thorough double-check. No discrepancy due to the mistake has been found so far.
Since the number is too huge to count in a single task, the entire task is divided into numerous small sub-tasks. Counts for the sub-tasks are available.
Strategies in counting magic squares
CUDA code (corrected on 2023.11.28 and updated on 2024.04.10)
nvcc -O3 -arch=sm_60 -maxrregcount=40 -Wno-deprecated-declarations ms.cu -lcrypto
-DnoMD5
and drop -Wno-deprecated-declarations
and -lcrypto
.-DN=order
../a.out
./a.out dummy representative_magic_series_in_hex
./a.out dummy1 representative_magic_series_in_hex dummy2 2nd_largest_magic_series_in_hex
Non-CUDA code in C using pthread (updated on 2023.09.18)
gcc -O3 -DNTH=number_of_threads ms.c -lpthread -lcrypto
-DnoMD5
and -DN=order
have the same effects as in the Cuda code.