I came into a project with a hydraulic component test stand that is a bit of a mess as far as calibration goes.
The stand itself is meant to test aircraft components, nearly 30 different components (valves, actuators, motors) from several different OEMs, on various different aircraft of a branch of the US military.
Instrumentation wise there are ~10 different types of sensors amounting to ~100 analog inputs measuring voltage, current, pressure, hydraulic flow, torque, force, and linear and angular velocity. This single test stand is meant to be a replacement for all of the different individual test stands the OEMs of each of those 30 different components have been using for their testing. The test specs come from acceptance test procedures (ATP) the OEMs (in conjunction with the Airframe mfg) created. Many of the specs in these ATPs will be something like "pressure switch must turn on by 3000psi". In many (probably most) of these cases the ATPs say nothing about the actual test benches being used or uncertainties about a given measurement. I can only guess (and it is just a guess) for this pressure switch example that 3000psi was arrived at based on what the UUTs typical performance is and takes into account the uncertainty of the sensor being used to measure this pressure.
Now...we received calibration procedures from the OEM of this test stand. However, it became apparent after a couple of cal cycles that the uncertainties being targeted by the procedures weren't achievable for various reasons. For many reasons we aren't able to go back to the test stand OEM to get this rectified. This is when my group of engineers without any formal metrology experience became involved trying to fix the calibration of the stand. We tackled the problem first by looking at what the individual sensor manufacturers were claiming for instrument accuracies and then on an almost arbitrary basis went anywhere from 1:1 to 4:1 up from there as the uncertainties we were going to try to achieve with our modified cal procedures. We then combed our ATPs to see what having these new uncertainties would do to the limits chosen for the parameters we will ultimately test to. We implemented the new uncertainties which in some cases were slightly relaxed from the OEM of the stand and in some cases we identified some things about the technique used for the cal (in the case of pressure and flow, at least) that helped with repeatability.
When we felt like we had fixed all of the repeatability problems we went back to our local metrology group for buy off on our changes, at which point they asked us why we didn't first do any analysis of all the test requirements because we shouldn't be trying to calibrate our instruments to their max performance if the testing didn't need it. I understand this from a costs/standards needed to do the cal perspective. But when I started pointing out the "must be more than 3000psi" type of specs to my metrology guy with the question "what uncertainty level do we choose for that reading"? The extremes being picking an uncertainty super easy to calibrate to but that might reject otherwise good components that have readings that would fall into the uncertainty range versus trying to cal the instrument as tight as possible which will drive calibration cost and headaches. My metrology guys solution was to try and contact OEMs and find out what uncertainties they have for a given measurement but I know from experience this simply isn't going to work for some of the OEMs as they aren't obligated to tell the Federal Government anything and they certainly aren't going to do it for free and certainly not on a reasonable timeline.
As I see it, my preference would be to give as good an effort as reasonable from a cost/schedule perspective to get the best uncertainties out of our instruments we can (which is what we have done) and then guardband the OEMs limits with those uncertainties until we are able to test enough parts to build up the statistics to decide if our test limits are too tight because of the guardband. My hunch is that for the vast majority of the test specs combined with the quality of the instruments we have we won't be rejecting a lot of components from guardbanding. Does anyone who survived this long of a post think this is the best approach given the situation described?