@slatestarscratchpad‘s new post on stimulant prescribing and ADHD is good.
One thing I’m curious about that was not addressed in the post is the role, in all of this, of computerized tests – specifically, “continuous performance tests.”
I had to take one of these – the TOVA (Test of Variables of Attention) – when I went in to get tested for ADHD in 2014. (I was in grad school at the time, and wanted to get tested for the same reasons as the “Senior Regional Manipulators Of Tiny Numbers” Scott talks about.) The tester said I didn’t have ADHD, and at the time I assumed my normal TOVA results weighed heavily in her decision, and (also) that this was normally how such things were decided.
But Scott’s post makes it sound like the usual procedure is a lot more of a human judgment call. He mentions a variety of things that prescribers do to make themselves feel better about their decisions, but none of them are “administer a computerized test with no human oversight and always follow what it says (or always do so unless you can think of a really good reason not to).” If nothing else, this would certainly reduce worries about human biases.
I say “if nothing else” there because the same thing would be true of any such test, even if it had no diagnostic value at all. (Then your decisions would suck – but even then, not because of your biases!) However, tests like the TOVA may indeed have a lot of diagnostic value. That is, they may have good sensitivity and specificity in discriminating controls from people with ADHD diagnoses***.
(There are even some studies showing it can discriminate these groups from people who are “faking bad,” i.e. malingering. This makes some sense if the distribution is light-tailed, e.g. normal, so that that if you overdo your faking by just a little bit you’ll stray from a region where 5% of the population lives to a region where 0.01% of it does.)
For one thing, if this is true, it means that we could just automate the whole process and get roughly the same results we were getting before, but without worries about human factors getting in the way.
Additionally, if true this is scientifically interesting, in part because of what it says about existing (non-computerized) diagnostic techniques. Scott’s post describes a very fuzzy, human process with a lot of variation between clinicians. But apparently this process has enough reliability to agree with a computerized test a lot of time, which would not be a priori obvious.
Moreover, if (as Scott says) ADHD is one extreme of a continuous/unimodal distribution, then we could use the TOVA to figure out where clinicians are already implicitly setting the cutoff. Scott writes:
We could still have a principled definition of ADHD. It would be something like “People below the 5th percentile in ability to concentrate, as measured by this test.”
We aren’t doing this, but what we are doing may be accidentally similar to it. The Schatz et. al. 2001 study, discussed further below, includes an ROC curve showing us how many false and true positives we get for various thresholds. The thresholds are for “T scores,” which are apparently like z-scores except the mean is set to 50 and the SD to 15, so that e.g. a threshold of 65 (the recommended one) means you say everyone who’s 1.5 SDs or more above the mean of the reference population has ADHD.
If everything were normally distributed, you could get quantiles out of this, and translate clinical behavior into cutoffs separating X% of the population from (100-X)% of it. (Well, sort of – the “reference population” here is neither the full population nor the non-ADHD population, it’s sort of a mixture determined by the selection criteria used to make the normative stats.) Of course, as usual, the people who made the reference stats don’t say anything about whether the distribution was normal. But this kind of analysis could be done by someone, in principle, anyway.
(***Caveat: the most widely cited study I could find on this was is Forbes 1988, which – astonishingly – was not blinded. That is, the TOVA was administered in the process of making the diagnostic decisions against which it was later compared, and were [Forbes’ words] “usually known before the final diagnosis was made.” Forbes goes on to claim that different TOVA results would not have flipped any of the diagnoses, to which my reaction is “okay, great, so if that was true, why did you show them to the clinicians at all?”
However, there are also studies like Schatz et. al. 2001 that give the TOVA to people who have already had a formal diagnosis done before the study started, and also to controls. There are still worries like “are we sure the original diagnoses didn’t use the TOVA or a similar test?” and “given our screening procedures for controls, what base rate of undiagnosed ADHD should we expect in our control population, i.e. how sure are we that some of our control ‘false positives’ weren’t true positives?”, so I still am not impressed with the evidence quality I’ve seen. That said, if you grant for the sake of argument that Schatz et. al. did things right, they get good sensitivity/specificity results too. Oddly, they interpret their results as bad news for the TOVA, on the basis that it does worse than a test based on parent ratings, but since the original diagnoses themselves involved parent ratings, this doesn’t seem like a fair/useful basis for comparison.)