These benchmarks are built with wowsims which is currently still in alpha with bug fixes and updates coming daily.
Beyond that, there are a number of issues with using simulation rankings to guide your decisions:
Simulation rankings, such as these, should never be used as anything more than a novelty/something interesting to look at.
Trinket rankings are still a work-in-progress.
Current Implementation:
We take the selected phase's bis profile gearset, remove the trinkets, and simulate to get the baseline result. We then generate a gearset for each trinket and compare the results against the baseline.
This implementation comes with some caveats that can scew the results:
With all that in mind, you should only consider this list as a rough approximation of the value of each individual trinket.
In the future we will reforge the gearset for each trinket to get a more accurate value.