Here’s a quote I enjoyed from an interview with John D. Cook:
It’s easy to get caught in circular reasoning. For example, how do you decide what data points are outliers? They are points that have low probability under your model. So you throw them out. Then, lo and behold, everything that’s left fits your model!
So how do you break out of the circle? You can start by visualizing your data. And after you select a model, validate it. If you’re fitting a model in order to make predictions, and your model indeed does make good predictions on new data, you can have some confidence that you’re not just playing mental games and that your model may be an approximation of reality. (emphasis added)