Due to the deluge of “expert” opinion online regarding the Phil VP race, we decided to do our own due diligence in the search for the truth. Special thanks to Reina Reyes for the dataset.
In the context of the data that are readily available, we want an approach that is quick and simple. In our opinion, the data that we have would not be sufficient to prove whether or not fraud was committed, but the data would be useful in verifying whether or not BBM’s claim is reasonable.
BBM claims that at some point in time (~roughly 3AM of May 10th, Phil time) the transmitted data were rigged, making way for Leni’s eventual lead in the VP race. One nice feature of such a claim that we can exploit is that there are effectively two sets of data, separated in time: (1) a set of clean, “un-manufactured” data (i.e. votes transmitted prior to ~3AM), and (2) a set of fraudulent, manufactured data (i.e. votes transmitted after ~3AM).
From a statistical standpoint, datasets have underlying distributions, dependent upon the fundamental process(es) that generated the dataset. To manufacture data, one must go to great lengths to preserve the underlying distributions of the data. Aside from the difficulty of generating fake data