At Amazon Last Mile, we deliver over 3.5 billion packages every year, making us one of the largest delivery companies in the world. At this scale, even small changes can have a big business impact. In general, business impact is assessed using controlled experimentation. A standard approach to evaluating whether controlled experiments resulted in a significant change has been to use a t-test. However, despite our scale the law of large numbers fails to produce a normal distribution, and the t-test fails up to 99% of the time. In addition to exhibiting non-normal distributions our application has restrictions on the granularity of control and treatment group splits and also suffers from geospatial correlation which causes treatment effects to be applied across both control and treatment groups at finer granularities (e.g., delivery of packages to multiple homes by the same delivery agent in one stop; a building falling on different routes on two different days depending on other stops on that day). This introduces a tradeoff between separability of effects through coarse granularity and detection of smaller treatment effects with fine granularity. In this paper we solve the t-test dilemma using a resampling test at scale, and further leverage this test to create a scalable, repeatable methodology for randomization split granularity choice under these constraints. We produce a sensitivity optimized randomization strategy using a data driven approach that has been applied successfully within multiple real experiments at Amazon Last Mile Tech and is generalizable to any experiment.