NMR Metabolomics of cholesterol lipids and heart disease/diabetes/etc

(Alex K Chen) #1


This is a really fascinating paper that does A LOT to deconstruct the cholesterol myth.

Concentrations of BOB:

bOHbutyrate (mmol/l) 0.041 0.028 0.065 0.051 0.024 0.123 0.049 0.035 0.081 0.042 0.028 0.067 0.083 0.058 0.127

(Bacon is a many-splendoured thing) #2

The problem with neural network modeling is that the network absorbs the preconceptions of its teachers. I’m not sure how one goes about training a network to be independent of what it is taught. For example, the paper notes that the neural network identified a strong correlation between glucose level and Type II diabetes, but given that glucose level is the very diagnostic criterion, why is this a surprise? It should be a given that the neural network would pick up on an association between a disease and its diagnostic. And if it didn’t, would the project leaders think to question the diagnostic, or would they assume that their neural network hadn’t learned properly? For example, if the network said that lipid panels were irrelevant to assessing cardiovascular risk, would they believe it? Probably not, since one of their data sources was a study of the effects of simvastatin.

My other concern is the quality of the input data. Classifying cases and outcomes is tricky, and how does one ascribe relative importance to multiple disease conditions? What is primary, and what secondary? Can we trust the clinical assessments of the original observers who provided the data?

I have to confess I didn’t understand the ins and outs of the paper, but a surface reading raised these concerns. Perhaps the paper answered them, and I just didn’t catch it, we’ll have to see. I look forward to learning more.

(Bob M) #3

That’s an intense study.

I think they used one source for the training, and other sources for validation:


I think the Pravastatin study was used for validation, not training.

For a neural network (NN), for training, they should just put data into it and set the output to be whatever they are looking for. For instance, if it’s a NN to find cat pictures, they input different pictures, and the output will be whether or not the input picture is a picture of a cat. That’s often in the form of a probability, eg, “there’s a 90% chance the picture is of a cat”. They keep doing this until the NN meets some set of criteria indicating it’s as good as it’s going to get.

Typically, what’s done next is they use a separate set of pictures to see how well the NN actually does.

And though I’ve used the example of a single output, you could have multiple outputs: cat; dog; horse; person; etc. These may be likelihoods, so that you’d have to see the outputs to determine which entity is supposed to be the most likely.

I think that’s what they did here: they used the UK Biobank cohort for training; then did validation using other “cohorts”. In other words, using other datasets, how well does our model (NN) actually perform at showing who will get what disease?

They say it’s good, of course. :wink:

It will take a lot more reading and deciphering to really analyze the paper, though. Sadly, time I don’t have.

(Bacon is a many-splendoured thing) #4

The problems come, however, when the NN identifies a picture as a cat, whereas the researchers wrongly believe that it is not a cat, or vice versa. They need to be open to the possibility that the NN got it right, and say, “By golly, that is a cat, who knew?” or whatever.

As we programmers like to say, garbage in, garbage out, so the training is only as good as the input data—and it’s very hard to train people, to say nothing of neural networks, to be iconoclasts.

(Chuck) #5

@PaulL as a retired IT Professional, Electronics technician, engineer and software and hardware quality assurance engineer I totally agree with you. People, programmers only see what they expect and want to see most of the time. I found so many programming errors that was a simple period or comma that was miss placed and the programmer was looking right at it and couldn’t see it. I have seen the same with doctors, and research scientists trying to figure out what was wrong with my mother and first wife. It is sad but human. And most computer errors are human errors in reality.

(Bob M) #6

I see no evidence that’s what they did. All they did (it seems to me) was use data that was available. That data included whether or not someone got, say, skin cancer. These are the endpoints:

I guess it’s possible for someone to write down T2D, when that was incorrect, but it seems unlikely.

To me, the endpoints are known and fixed. The inputs are (mainly or maybe completely) blood data.

There does not seem much room for “garbage in, garbage out”.

(Joey) #7


(*problem exists between keyboard and chair)