Primary Care Autism Screening and Later Autism Diagnosis

Abstract

OBJECTIVES: To describe the proportion of children screened by the Modified Checklist for Autism in Toddlers (M-CHAT), identify characteristics associated with screen completion, and examine associations between autism spectrum disorder (ASD) screening and later ASD diagnosis.

METHODS: We examined data from children attending 18- and 24-month visits between 2013 and 2016 from 20 clinics within a health care system for evidence of screening with the M-CHAT and subsequent coding of ASD diagnosis at age >4.75 years. We interviewed providers for information about usual methods of M-CHAT scoring and ASD referral.

RESULTS: Of 36 233 toddlers, 73% were screened and 1.4% were later diagnosed with ASD. Hispanic children were less likely to be screened (adjusted prevalence ratio [APR]: 0.95, 95% confidence interval [CI]: 0.92–0.98), and family physicians were less likely to screen (APR: 0.12, 95% CI: 0.09–0.15). Compared with unscreened children, screen-positive children were more likely to be diagnosed with ASD (APR: 10.3, 95% CI: 7.6–14.1) and were diagnosed younger (38.5 vs 48.5 months, P < .001). The M-CHAT’s sensitivity for ASD diagnosis was 33.1%, and the positive predictive value was 17.8%. Providers routinely omitted the M-CHAT follow-up interview and had uneven referral patterns.

CONCLUSIONS: A majority of children were screened for ASD, but disparities exist among those screened. Benefits for screen-positive children are improved detection and younger age of diagnosis. Performance of the M-CHAT can be improved in real-world health care settings by administering screens with fidelity and facilitating timely ASD evaluations for screen-positive children. Providers should continue to monitor for signs of ASD in screen-negative children.

Abbreviations:

APR —: adjusted prevalence ratio
ASD —: autism spectrum disorder
CI —: confidence interval
EHR —: electronic health record
EIP —: Early Intervention Program
ICD —: International Classification of Diseases
IHC —: Intermountain Healthcare
M-CHAT —: Modified Checklist for Autism in Toddlers
NPV —: negative predictive value
PPV —: positive predictive value
SES —: socioeconomic status

What’s Known on This Subject:

Universal autism screening in toddlers is recommended, but it is unknown how frequently this occurs, what factors are associated with screening, and the performance characteristics of the most commonly used screening instrument in real-world health care settings.

What This Study Adds:

Autism screening was completed in the majority of toddlers but was less likely to occur in Hispanic children. Children who screened positive were more likely to be diagnosed with autism and were diagnosed earlier, but false-negative screens were common.

Early identification of autism spectrum disorder (ASD), through developmental surveillance and screening, allows children access to ASD-specific behavioral interventions that improve long-term outcomes.^1–3 Developmental surveillance that entails history-taking and observation for signs of ASD lacks sensitivity, especially during brief visits.⁴ In 2007, to improve sensitivity and lower age of ASD diagnosis, the American Academy of Pediatrics recommended ASD screening at 18- and 24-month visits.² Despite this, there has not been a significant decrease in age of ASD diagnosis over the ensuing decade.⁵

Delayed identification may be related to several factors, including low rates of ASD screening, lower performance characteristics of ASD screening instruments in “real-world” practice, and barriers to screen-positive children being evaluated for ASD in a timely fashion. Among pediatricians, 81% report routinely administering ASD screening tools,⁶ most commonly the Modified Checklist for Autism in Toddlers (M-CHAT) or the Modified Checklist for Autism in Toddlers, Revised.^3,7 Studies have shown that use of the M-CHAT leads to identification of ASD at younger ages. However, these studies were performed in practices with research support.^3,7 In a recent longitudinal population-based study of screening at 18 months, authors reported a low sensitivity of the M-CHAT in identifying children with ASD due to a high proportion of false-negative screens.⁸ Additionally, the recommended M-CHAT follow-up interview for screen-positive children may be difficult to complete and is often omitted in community practices, affecting performance characteristics.^7,9–11 Therefore, studies in which researchers examine the M-CHAT in practices without support for scoring or completion of the screen and cost-free ASD evaluations are needed to identify barriers to early identification of children with ASD.

With our study, we had 6 objectives: (1) estimate the rate of ASD screening at 18- and 24-month visits, (2) identify factors associated with screen completion, (3) evaluate whether screening is associated with improved identification of ASD, (4) determine if ASD screening was associated with a younger age of ASD diagnosis, (5) report performance characteristics of the M-CHAT, and (6) obtain qualitative information about M-CHAT administration from participating practices.

Methods

We analyzed electronic health record (EHR) data from children aged 16 to 30 months attending 18- and 24-month health supervision visits between 2013 and 2016 at Intermountain Healthcare (IHC) clinics in Utah. We included visits to clinics whose providers used an EHR M-CHAT result field that had been added in 2013, allowing providers to document whether the M-CHAT was completed (yes or no) and the result (negative [pass] or positive [fail]). To identify factors associated with M-CHAT completion, we analyzed visit-level variables (18- or 24-month visit, year of visit, provider type [pediatrician, family physician, advanced practice provider]) and patient and family variables (sex, race and ethnicity, insurance type, family location, and Area Deprivation Index [proxy measure of socioeconomic status (SES) that utilizes US Census Bureau data on poverty, housing, employment, and education¹²]). We identified children in our cohort diagnosed with ASD by International Classification of Diseases (ICD) codes (ICD-9 299.xx, ICD-10 F84.x) entered in visits up to May 31, 2019 (aged 57–107 months at time of data pull). In Utah, ASD diagnostic services are provided by IHC specialty clinics, in university-based clinics, in Title V–supported clinics administered by the Utah Department of Health, and by providers in private practice. We estimated the age of ASD diagnosis by identifying the age at which the first ASD code was entered into the EHR, whether by the diagnosing professional or another provider within IHC on the date when seeing the child and learning of the diagnosis. To examine associations between ASD screening and ASD identification, we compared the prevalence of ASD at the time of our data pull in screened children with the prevalence of children who were not screened. We also compared the prevalence of ASD in screen-positive children with the prevalence of ASD in children who screened negative. To assess for associations between ASD screening and age of ASD diagnosis, we compared the age of first ASD diagnosis among screened children with those not screened as well as in those who screened positive with those who screened negative. By examining the screening result (positive or negative) and later diagnosis of ASD among all children included in our cohort, we were able to determine the sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) of the M-CHAT. Finally, we obtained qualitative data by contacting physicians in participating practices. We asked how the M-CHAT was administered, how the first stage of the M-CHAT was typically scored, if the follow-up interview was administered when indicated, and the referral pattern for screen-positive children.

Our data included 3 levels of nesting: provider visits (1 or 2 visits) nested within patient, patients nested within provider, and providers nested within clinics. There was no interest in a within-patient analysis (visit 1 compared with visit 2), so regression models were fitted for either a visit-specific result or a combined-visit result, eliminating the need to include a visits within patient level of nesting in the models. We used multilevel regression models to account for the remaining 2 levels of nesting: patients nested with provider, and providers nested within clinics. For describing patient and family and visit characteristics, we used ordinary descriptive statistics, rather than estimates derived from multilevel models. We modeled age of ASD diagnosis by receipt of screening (yes or no) and result of screen (positive or negative) using multilevel linear regression. We modeled ASD screening prevalence (visit-level data) and the association of screening and ASD diagnosis (child-level data) using multilevel binary Poisson regression with robust SEs. We choose binary Poisson regression because, for cohort or cross-sectional study designs, prevalence ratios can be accurately estimated directly by exponentiating the regression coefficient.^13,14 We added visit and patient and family variables to these models to obtain adjusted mean ages and adjusted prevalence ratios (APRs). We calculated test characteristics (sensitivity, specificity, PPV, NPV) of the M-CHAT in predicting a later diagnosis of ASD using simple 2 × 2 tables, with no adjustment for the multilevel structure in the data. Given the small amount of missing data (2.2%), all analyses are complete case analyses, in which patients with missing data were dropped from the analyses. When <5% of data are missing, particularly with large sample sizes, the complete case analysis approach provides sufficiently unbiased estimates.¹⁵ All reported P values are for 2-sided comparisons using the Stata-15 statistical software (StataCorp LLC, College Station, TX).

The University of Utah Institutional Review Board determined the study was exempt from review.

Results

Demographics

There were 48 307 children with 18- and 24-month visits during the study period. We excluded visits by children seen at clinics not routinely using the M-CHAT result field, leaving data from 36 223 children seen during 59 139 visits at 20 clinics by 172 providers (Fig 1). Patient and family demographic variables and visit-level characteristics are shown in Tables 1 and 2. Our cohort consisted of predominantly non-Hispanic white children. The majority lived in urban or suburban areas, were privately insured, and were from families from higher socioeconomic backgrounds. Most visits were completed by pediatricians.

TABLE 1

Patient and Family Characteristics

TABLE 2

Visit Characteristics

FIGURE 1

Flowchart of ASD screening and diagnosis for all patients studied.

M-CHAT Screening

Among 36 223 children, 72.8% were screened at either the 18- or the 24-month visit; among those who attended 18- and 24-month visits (n = 20 072), 54.4% were screened twice (Table 3). More children were screened at the 24-month visit than at the 18-month visit (66% and 62%; P < .001), but fewer had positive screens (1.6% vs 2.4%; P < .001). A total of 378 (72%) of the 522 children diagnosed with ASD had been screened. Of these, 228 (60.3%) were screened once; 150 (39.7%) were screened twice. A total of 165 (72%) of the 228 children with ASD who had been screened once (at either 18- or 24-month visits) screened negative. Eighty-eight (59%) of the 150 children screened twice screened negative at both visits. Thus, of the 378 children later diagnosed with ASD, 253 (67%) had screened negative (Fig 1). Table 4 shows the factors associated with M-CHAT completion. In univariable analysis, Hispanic children and those from lowest socioeconomic backgrounds were least likely to be screened, and family physicians were less likely to screen. In multivariable analysis, visits with Hispanic children (APR: 0.96, 95% confidence interval [CI]: 0.93–0.99) and by family physicians (APR: 0.12, 95% CI: 0.09–0.15) remained associated with a lower likelihood of M-CHAT completion.

TABLE 3

M-CHAT Screening Rate and Results, by Patients and Visits

TABLE 4

Factors Associated With M-CHAT Completion

M-CHAT Screening and ASD Diagnosis

Among the entire cohort (n = 36 223), 522 children (1.4%, 1 in 69) had been diagnosed with ASD at the time of our data pull (age range 57–107 months). The prevalence of ASD in children screened and not screened in both univariable and multivariable analyses was similar. In multivariable analysis, children who screened positive were 17 times more likely to be diagnosed with ASD than screen-negative children (APR: 17.0, 95% CI: 13.1–22.0) and 10 times more likely than children not screened (APR: 10.3, 95% CI: 7.6–14.1) (Table 5).

TABLE 5

Associations Between ASD Diagnosis and M-CHAT Completion or M-CHAT Screen-Positive

M-CHAT Screening and Age of ASD Diagnosis

There was no difference in the age of ASD diagnosis in children who were screened and not screened (46.5 months [95% CI: 43.5–49.6] versus 48.5 months [95% CI: 44.6–52.3], respectively, P = .34). However, among screened children later diagnosed with ASD, those who screened positive were diagnosed at a younger age compared with those who screened negative (38.5 months [95% CI: 34.2–42.7] compared with 50.6 months [95% CI: 47.0–54.1]; P < .001) (Table 6). Screen-positive children were also diagnosed 10 months earlier than children not screened (38.5 months [95% CI: 34.2–42.7] vs 48.5 months [95% CI: 44.8–52.2]; P < .001).

TABLE 6

Age of ASD Diagnosis by M-CHAT Completion and Result

Performance Characteristics of the M-CHAT

We assessed performance characteristics of the M-CHAT in predicting a diagnosis of ASD among the 26 364 children who had been screened and found the following: sensitivity of 33.1% (95% CI: 28.3–38.1), specificity of 97.8% (95% CI: 97.6–97.9), PPV of 17.8% (95% CI: 15.0–20.8), and NPV of 99.0% (95% CI: 98.9–99.1). There was a significantly higher sensitivity in nonwhite compared with white children and in children screened twice compared with once. The PPV in boys was higher compared with girls and in children from lower SES families compared with those from higher SES families (Table 7).

TABLE 7

Performance Characteristics of the M-CHAT Within the Entire Cohort and Subgroups in Predicting Later Diagnosis of ASD

M-CHAT Administration in Practices

After contacting all 20 participating practices, we were able to interview physicians from 12 practices who described how the M-CHAT was typically administered. All practices administered M-CHATs on paper because there were no electronic or online means to screen through the EHR. With regards to scoring, 10 practices used the recommended scoring, 1 used a mix of “eyeballing” completed screens and scoring, and 1 did not score M-CHATs. In 7 of 10 practices scoring the M-CHAT, medical assistants typically administered and scored the first step of the M-CHAT. One practice typically administered follow-up interviews, whereas 9 practices did not, and 1 practice used a case manager to “informally” clarify missed items using the follow-up interview as a guide. Referral patterns after positive screens were mixed. Three practices referred simultaneously for ASD evaluations and to Early Intervention Program (EIP). Three practices typically referred only to EIP, and 4 practices referred for ASD evaluations but not to EIP. Two practices had no set referral patterns.

Discussion

In this study of ASD screening in real-world practices within a large health care system, we found approximately three-fourths of children were screened for ASD at least once and half were screened at both recommended ages. Screening occurred less often in Hispanic children and at visits with family physicians. Children who screened positive were more likely to later be identified with ASD and were diagnosed at a younger age. Our ability to follow-up on outcomes of screen-negative children allowed for broader estimates of performance characteristics than previous studies. In doing so, we found a lower-than-expected sensitivity and PPV than previously estimated (33% and 18%, respectively). Qualitative data regarding lack of fidelity in screen administration and uneven referral patterns after positive screens provides context into why our findings may differ from studies of the M-CHAT with research support.

Previous estimates of ASD screening have varied significantly (between 17% and 81%) and relied on physician report of “usual practice.”^6,16 Our study, along with 2 recent studies, adds to understanding of ASD screening because estimates are based on data from actual visits rather than physician recall and indicates that, although ASD screening is far from universal, a high proportion of children are screened at least once.^10,11

With our study, we provide important insight into disparities in ASD screening; researchers in previous studies suggested that ASD screening in primary care reduces racial and ethnic disparities in ASD identification and age of diagnosis.¹⁷ Hispanic children in our cohort were less likely to be screened compared with non-Hispanic white children, a finding supported by one previous study.¹¹ We also found family physicians rarely administered ASD screening tools. The American Academy of Family Physicians does not recommend universal ASD screening, which may explain differences in screening between pediatricians and family physicians.¹⁸ No previous studies have had estimated rates of ASD screening among family physicians, although researchers of one qualitative study of a small group of family physicians found that, rather than screening, participants relied on developmental surveillance to identify children with ASD.¹⁹ Given that family physicians provide 16% to 21% of pediatric care, this new finding uncovers an opportunity to further increase ASD screening rates in the United States.²⁰

A strength of our study was the ability to longitudinally follow children seen for 18- and 24-month visits, allowing an estimate of performance characteristics of the M-CHAT not possible in previous validation studies.³ Even after controlling for potential confounding factors, we found a strong association between a positive ASD screen and a later diagnosis of ASD. This new finding supports the policy of ASD screening as a means of improving identification of children with ASD. We found that the age of diagnosis did not differ between screened and unscreened children. Although this could be interpreted as a lack of impact of screening, we did find a clinically important difference when we separated the screened group into those who screened negative and those who screened positive. Specifically, we found that children who screened positive had a diagnosis of ASD in the EHR 12 months earlier than those who screened negative and 10 months earlier than those not screened. Therefore, these findings add new evidence that when a child screens positive, screening may lower the age of diagnosis. We also conclude from these results that the lack of difference between the screened and unscreened groups is being driven by later diagnosis in the subgroup with false-negative results on screening. This result highlights the importance of developing a system or a screening instrument that can reduce false negatives. Our uncontrolled, real-world setting does not allow us to draw conclusions about whether false-negative screens are driven by children who were missed by the screening instrument or who were missed by a flawed system of screening and referrals. Our findings support the American Academy of Pediatrics recommendation for universal screening in that more children and younger children with ASD will have opportunities to access evidence-based therapies that improve outcomes.²

We identified a large number of children later diagnosed with ASD who screened negative (false negatives), leading to unexpectedly low sensitivity of the M-CHAT. This finding is consistent with previous studies in which researchers identified groups of children with ASD who passed a single M-CHAT despite having deficits in social communication, suggesting that providers should closely monitor for signs of ASD with developmental surveillance and broadband developmental screening tests and refer for concerns even after a negative ASD screen.^2,21–23 Similar to another study conducted in a real-world setting, we found that the sensitivity of the M-CHAT improved among children screened twice, suggesting that providers should screen at both visits.¹¹

We additionally found children who screened positive but were later identified as not having ASD in the EHR. This resulted in a lower PPV, 17.8%, than reported in the most recent validation study of the M-CHAT (47%) but closer to that in another study (14.6%) done in a similar real-world setting.^3,11 Our qualitative data revealed low usage of the recommended M-CHAT follow-up interview, which is intended to reduce false positives. Previous studies of the M-CHAT have used trained research assistants to administer the follow-up interview by phone; pediatricians might have difficulty completing follow-up interviews in the context of brief visits.^3,7 Interventions that have been shown to improve pediatrician completion of follow-up interviews include online or digital decision support tools.^24,25 Additionally, what we defined as “false positives” may in reality be children with ASD who were not referred for ASD evaluations after screening positive and thus not formally diagnosed, a practice that has been reported by providers in this and other studies.^10,16,22 In contrast, families of children who screened positive in validation studies of the M-CHAT were offered free and timely ASD evaluations, which facilitated identification and a young age of diagnosis. Families of children in our study, like those across the country, may struggle to access affordable and timely ASD diagnostic evaluations.^26,27 For example, there was an average of 646 days between a positive screening test and the first ASD codes entered into the EHR. This suggests that families were not referred after positive screens, had difficulty accessing evaluations once they were referred, or chose not to pursue ASD evaluations in a timely fashion. A similar study showed that, among children who screened positive, only 31% were referred for specialty evaluation and, among those, 65% were completed, suggesting that further interventions are needed to facilitate referral completion.¹⁰ Taken together, the lower-than-expected performance characteristics of the M-CHAT found in this study may have been due to system-level issues seen in real-world health care settings (not administering the screen twice, not administering the follow-up interview, not referring after positive screens, difficulty accessing ASD evaluations) and issues related to child characteristics or the screen itself, all of which warrant further study. Various strategies to address system-level barriers at the practice level, such as electronic decision support, clinical informatics reporting, and follow-up phone calls, may allow for tracking of positive screens and referral completion.²⁸

We found differences in performance characteristics of the M-CHAT in several subgroups. The M-CHAT had a lower sensitivity and lower PPV in non-Hispanic white children and children from higher SES backgrounds, respectively. Researchers of 2 previous studies assessed the accuracy of screening and found mixed results with regards to PPV in nonwhite children compared with white children.^9,11 We also identified a lower PPV in girls compared with boys, a finding that differs from the most recent validation of the M-CHAT but aligns with an M-CHAT study that systematically followed-up with all children screened.^3,11 Given that girls with ASD are diagnosed less often and later, improving the accuracy of ASD screening in girls should be prioritized. Disparities in availability of screening and diagnostic services across subgroups has implications for psychometric properties of screeners; therefore, these results deserve more in-depth investigation in future studies.

The limitations of our study are related to reliance on EHR data to identify ASD screening and diagnoses. Without being able to clinically confirm ASD diagnoses, results regarding performance characteristics of the M-CHAT should be considered preliminary. However, the accuracy of ASD diagnoses in our study is supported by the similar prevalence of ASD in our cohort compared with national prevalence estimates of same-aged children.^5,29 Additionally, our estimates of time to diagnosis are limited by our use of time when the diagnosis entered the EHR and, therefore, likely overestimate the time by some unknown amount and should be interpreted with caution. Findings from a previous M-CHAT validation study suggest that we should have expected a screen-positive rate of 7.4% if not using the follow-up interview, and yet our screen-positive rate was only 2.7%.³ In a more recent study, however, done within a real-world health care setting that did not use the follow-up interview, researchers found a screen-positive rate similar to ours (3%).¹⁰ The reasons for a lower-than-expected screen-positive rate in real-world settings in which the M-CHAT follow-up interview is not used is one that deserves further study. In our study, we hypothesize that the lower-than-expected proportion of positive screens could stem from several possible causes. One may be that physicians informally asked follow-up questions, although they reported not formally administering the follow-up interview. Another could be that errors in scoring of paper screeners resulted in positive M-CHATs recorded as negative, as was seen in a previous study that rescored paper forms of the M-CHAT.²⁵ If incorrect scoring of paper screeners was responsible for the lower-than-expected screen-positive rate in our study, health care systems could invest in electronic administration and scoring to improve the accuracy of ASD screening. Although the EHR data we had access to did not allow us to review the scoring of the M-CHAT, we are in the process of performing chart reviews of screen-negative children with ASD to investigate how M-CHAT scoring was performed and whether there were concerns related to ASD documented during visits, as reported in a study finding delays in social communication in children with ASD who screened negative at 18 months.²¹ We are also limited by reliance on data and processes from one large health system in Utah, which may not be representative of other health care systems or patient populations. Finally, there was a lack of information about Spanish language availability of the M-CHAT among the practices, although administering the M-CHAT in Spanish did not eliminate disparities in screen completion between English- and Spanish-speaking families in a previous study.¹¹ A closer investigation of the factors that might enhance efforts to increase the parity of screening rates in Hispanic families is needed. Despite these limitations, our study provides new information and perspective about ASD screening in real-world primary care practices.

Conclusions

Our results suggest progress toward universal ASD screening in primary care practices. Still, only half of children in our study were screened at both 18- and 24-month visits, suggesting further work to be done to encourage providers to complete screens as recommended, with more efforts to screen Hispanic children. Further advocacy and education are needed to encourage family physicians to screen for ASD. More resources are needed for implementing ASD screening at the practice and health care system level, with particular attention to administering ASD screening tools with fidelity, prompt referral of children who are found to be at-risk, and increasing the availability of ASD diagnostic providers to facilitate prompt ASD evaluations.

Acknowledgment

We thank Catherine Jolma, MD for her review of the article.

Footnotes

Address correspondence to Paul S. Carbone, MD, Department of Pediatrics, University of Utah, 295 Chipeta Way, Salt Lake City, UT 84109. E-mail: paul.carbone{at}hsc.utah.edu
FINANCIAL DISCLOSURE: Dr Campbell discloses that she is an inventor on a patent related to screening for autism spectrum disorder; the other authors have indicated they have no financial relationships relevant to this article to disclose.
FUNDING: Supported by the University of Utah Population Health Research Foundation, with funding in part from the National Center for Research Resources and the National Center for Advancing Translational Sciences, National Institutes of Health, through grant UL1TR002538 (formerly 5UL1TR001067-05, 8UL1TR000105, and UL1RR025764). The research reported in this publication was supported in part by the Utah Stimulating Access to Research in Residency Transition Scholar under award number 1R38HL143605-01. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. Funded by the National Institutes of Health (NIH).
POTENTIAL CONFLICT OF INTEREST: Dr Campbell discloses that she is an inventor on a patent related to screening for autism spectrum disorder; the other authors have indicated they have no potential conflicts of interest to disclose.
COMPANION PAPER: A companion to this article can be found online at www.pediatrics.org/cgi/doi/10.1542/peds.2020-1467.