Zack Morris wrote:Typhoon wrote:
Not in my experience in my former or current fields.
What you're describing is scientific fraud - a strong allegation. Do you anything other than personal anecdote to back it up?
I could name names but I'd rather not.
It's better not to.
Zack Morris wrote: I can offer more details, though. A good friend of mine whose adviser is a prominent statistics professor at my university (whose stats program is certainly in the top 5 nationally) tells me he has frequently come across papers whose analyses are either not reproducible (because of major omissions by the authors) or not at all applicable beyond the extremely narrow, cherry-picked data picked for the papers. Regarding cherry picking, it's common practice, he tells me. Most recently, his adviser, whose specialty is financial modeling, asked him to collaborate on a paper having to do with hedge fund returns. I don't know the technical details but I presume it was related to the topic of fund cloning. They had a concept and developed a model around it. When tested against the actual data, it didn't look so hot anymore. The professor's advice? 'Just pick 4 or 5 points for which it looks good.'
Again, this is all anecdotal. I don't think should condemn an entire field on hearsay. One would really have to go through the papers in question.
On the other hand, there is the every present "publish or perish" pressure with cutthroat competition for grants so, given human nature,
it's not too surprising that some people will choose to corners. ideally peer review and replication should weed this out, but given that most
published papers are read only by the authors and the reviewers, and given the exponential growth in the number of journals, it's not practical.
Zack Morris wrote: I've heard from a CS professor (at our very well regarded CSE dept.) that it's much the same in the field of image processing, which is far from some sort of bogus science.
Well, I am a bit familiar with specific subfields of the vast field of image processing and I have not come across rampant fudging in those specific subfields.
So again, it's my hearsay versus your hearsay.
Certain aspects of image processing are among the most difficult current algorithmic challenges.
For example, segmentation of complex natural images: all current algorithms
will fail on some subset of such images.
However, that does not mean that they are in any way bogus.
There are canonical databases, such as the
The Berkeley Segmentation Dataset, against which algorithms are tested and results reported.
Zack Morris wrote:In my own field, which is very heavy on modeling, I've been disillusioned by what I've seen. Our group's alumni have all gone on to relatively successful careers at places like Intel. Imagine my dismay at discovering that I can't ever seem to replicate their results, even when I manage to dig up the code they actually wrote. In one case, the interpolation scheme a colleague used to implement a published model was obviously incorrect: it did not reduce to the discrete set of equations in the limit of the sample spacing being reduced to 1. I've spent a whole year trying to replicate published models with absolutely zero success. And I know why: the authors took tremendous liberties implementing the equations they list in their manuscripts. The code is not available because they don't want anyone to see the fudge factors, the magical initial conditions, and little tweaks customized on a case-by-case basis. They always compare to a few experiments but you can bet the models break down as soon as they are applied to any other data.
Sounds like your group would benefit from remedial courses in numerical methods
Numerical methods can be both subtle and complex. As such they are under emphasized with many grads learning them [poorly] on-the-fly.
Zack Morris wrote:These aren't empirical fits, by the way, but chemical rate equations with a solid analytical basis.
Chemical rate equations? Have you investigated the stochastic simulation method that is quite robust as compared to the many numerical stability, boundary sensitivity, and initial condition problems that can plague conventional continuum ode or pde methods? The time steps are intrinsically determined by the rate constants. This method, which solves the equivalent chemical master equation has been used mostly in computational biology wherein the approximation of a continuum concentration can break down. A few years ago I read a paper from LANL that provided a significantly decrease in the order complexity of the algorithm. Also there has been some recent work in implemenation on GPUs. Obviously don't know if it's applicable to your work, but thought I'd mention it in passing as few people know about it.
Zack Morris wrote:This summer, I corresponded with one author about a very simple model he had implemented. It was a perfect test case. The code was not available but he was very explicit about his initial conditions and his choice of discretization. When asked for the code, he didn't even give a reason, but simply refused. Very well, I wrote my own. I wasn't at all surprised when it didn't work. So I sent it to him. He made some trivial tweaks and sent it back, saying it now looked correct. Did it reproduce his results? Nope. Not at all.
Everyone can and does make mistakes in algorithms. Are you sure
you got the implementation right?
Zack Morris wrote:My favorite examples involve hunting down PhD theses describing similar models only to find that not only is the code unavailable, but their formulas are flat out incorrect and could not have been implemented as written. Because of how utterly stupid the scientific journal system is, authors conceal these things with brevity and generality.
The better journals have a policy of requiring that data, and any relevant analysis code, associated with a published paper be archived for independent analysis.
This should be standard policy for all journals. That is it not is a current failing.
Zack Morris wrote:I have a couple of colleagues working in MEMS-oriented groups at our department. One of the groups has even received national media attention. They both tell me that journal papers are often not completely reproducible because devices that are described are often the single one out of perhaps ten or more attempts that actually functioned. There is an enormous amount of variability in these processing techniques, everyone understands that, but nobody ever talks about failed cases or problems encountered. All of that is swept under the rug.
It took an enormous industrial R&D effort to go from the lab to dead pixel free large LCD displays that we now take for granted. The MEMS situation is likely analogous.
Was recently talking about OLEDs with a company guy from Sony. He said that the current rejection rate for faulty industrial OLED displays is about 30%.
Zack Morris wrote:Returning to my field, crucial experimental parameters are omitted because they are difficult to measure. Initial concentrations, for example, and boundary conditions are simply made up when attempting to model the experiments, and then elaborate justifications are made for why the model itself is still accurate.
Experimental science is a difficult undertaking.
Every model has assumptions, the question is whether or not the assumptions are physically realistic.
If you're in a new field wherein people are still trying to figure out how to do things, consider yourself fortunate.
Any advance you publish may make you one of the "grand old men" of the field, even if it seems trivial in the future.
Basic research is what I'm doing when I don't know what I am doing.
~ Werner von Braun
Zack Morris wrote:I think you have to take this stuff for what it is: imperfect but sometimes qualitatively valuable. Consider density functional theory. It's garbage when it comes to making quantitative predictions concerning some of the best known material systems to man. We all know that. But we still use it because within some range of limits, it is a useful tool. Someone not familiar with the technique could easily start a blog ridiculing DFT and its inability to predict the silicon band gap, or the essentially unobservable nature of predictions of molecular structure. Likewise for molecular dynamics, which is little more than crude empirical fits and classical mechanics, and is widely used (almost unquestioningly) to study very complex biological phenomena.
It's a flawed analogy. The underlying physics on which
DFT is based, QM, is known.
Unlike climate research, the known problems and limitations of DFT are out in the open. No one is trying to hide them.
Someday, someone will come up with an improvement on current DFT. It's a very active area of research.
Analogous open problems are solving the Navier-Stokes equations of fluid dynamics in the turbulent regime,
the QCD confinement problem, and the existence of
Yang-Mills theory and the mass gap
to name but three.
In the case of the earth's global climate, a far more complex open system, the underlying factors driving the dynamics are still being discovered.
As a minimum, current models are thus incomplete.
Zack Morris wrote:In light of all that I've seen, I don't have any reason to be surprised by the so-called 'Climategate' emails. The people most likely to hold them up as evidence, in my experience, are not scientists.
Well, for the sake of argument, let's take what you've written at face value.
Then you've provided an excellent and highly compelling case as to why so-called climate research should be
completely discounted when it comes to economic and other policy.
Zack Morris wrote:Odd. I read several papers per week and often have questions for the authors. Most have been both prompt and forthcoming with their replies.
How often do they send over their complete data and code (if applicable)?
We typically [re]produce the code ourselves from the papers and test it against similar, but not identical, data.