That’s easy – You start by looking at what parts of your code are the most expensive – Keep drilling down into the costly calls until you find which part of code really is the culprit. And then you optimize it. Easy isnt it ??
If you do not have ways to do this, your tool-set sucks.
[i use the Intel Thread Checker]
[click to see the big picture]