GFLOPS = number of cores × core frequency (GHz) × number of operations per clock cycle For the equation, you use physical cores, not logical (threads). Also, the number of operations a processor core can complete per second varies depending on the architecture of the processor in question, and whether you're after single or double precision figures. With any recent GPU, you can get the theoretical GFLOPS as a computation of (number of shaders) * (clock speed) * 2, with the 2 because FMA counts as two operations by convention. There is the issue of how well you can exhaust that computational capability. In simple synthetics, you can get something like 99% of theoretical peak performance on Maxwell or the various GCN cards, though not on ... A dual-core Haswell or later only has to run 1.6 GHz to do 100 GFLOPS FP32. Admittedly, that's a lot of wasted precision and it's probably still less efficient per FLOP at inferencing than ... I'm confused on how many flops per cycle per core can be done with Sandy-Bridge and Haswell. As I understand it with SSE it should be 4 flops per cycle per core for SSE and 8 flops per cycle per core for AVX/AVX2. This seems to be verified here, How do I achieve the theoretical maximum of 4 FLOPs per cycle?,and here, Sandy-Bridge CPU specification.

