Longitudinal cohort studies ofspecific populations providesome of the most compellingevidence for research spanning epidemiology,medicine, and social science.The Framingham Heart Study(FHS) is a good example. Initiated in1948, the FHS tracks a cohort of whitemen and women who reside in thetown of Framingham, Massachusetts.The study population receives biennialmedical exams and personal interviews,and an additional study hasfollowed their offspring. The FHShas contributed much of our knowledgeabout cardiovascular disease incidenceand prevalence, and its riskfactors.
Longitudinal cohort studies of specific populations provide some of the most compelling evidence for research spanning epidemiology, medicine, and social science. The Framingham Heart Study (FHS) is a good example. Initiated in 1948, the FHS tracks a cohort of white men and women who reside in the town of Framingham, Massachusetts. The study population receives biennial medical exams and personal interviews, and an additional study has followed their offspring. The FHS has contributed much of our knowledge about cardiovascular disease incidence and prevalence, and its risk factors.
The FHS is not as useful for addressing health services research. Ideally, such a cohort could be used to answer key questions such as how insurance affects cardiovascular health, and whether preventive treatment- eg, with antihypertensives- can result in long-term health-care cost savings. Because the FHS was designed for epidemiologic purposes, insurance information is limited and there are no attendant cost data. Framingham is not representative of the US population as a whole, and the study did not track people who migrated away.
These points are not meant as criticism of the FHS-the study was remarkably innovative and is still yielding important insights more than 50 years later-but to emphasize that the design of a cohort study should not only address contemporaneous research issues but also try to anticipate and address broader questions. In the current context, Calhoun and Bennett propose a large-scale cohort study to investigate the costs-both direct and indirect-of treating cancer. The proposed study would be similar to the HIV Cost and Services Utilization Study, a $35 million effort to provide representative cost estimates for treating human immunodeficiency virus (HIV). To justify the expense of such an undertaking, a cohort study must address more general questions by the cancer research community.
The Health and Retirement Study (HRS) provides a useful prototype. Started in 1992, the HRS now surveys more than 22,000 Americans over the age of 50 every 2 years. The study paints a broad landscape of an aging America's health, financial status, family support systems, labor market status, and health insurance.[ 1] The result is a survey that can be used to address research questions across many disciplines, thereby justifying the enormous expense required to find and track individuals.
Maintaining similarity with the HRS could allow for useful benchmarking between a cancer population and a noncancer cohort. The HRS is limited because of the absence of cancer-specific treatment and history, but these modules could easily be developed and added. A cancer cohort study (and the HRS, too) should consider collecting two other data sources that could greatly augment its research value: medical records and genetic material for future research purposes.
If the goal really is to produce generalizable cost estimates (direct and indirect), it is worth considering more limited but less expensive designs. Cost estimates from the HIV Cost and Services Utilization Study assigned prices to self-reported measures of utilization. The prices themselves actually came from secondary data sources. For example, the cost of a hospital day came from a study in the early 1990s that collected actual reimbursement data. It was then multiplied by patient self-reports of how many days they spent in the hospital during a 6-month period. This imputation is reasonable for analyses that intend to use costs as a dollar-denominated measure of resource utilization.
Calhoun and Bennett propose a similar imputation. They would delineate utilization and then assign prices to those services to estimate costs. The source of utilization data may be patient self-report or medical record abstraction. Patient recall becomes more difficult as the reference window expands, but the use of diaries and shorter recall periods can improve recall tremendously. Once utilization is obtained, prices could be assigned to services based on secondary data such as Medicare fee schedules.
Other alternatives are available. One of the best ways to measure costs is to get data from insurers. Out-ofpocket expenses and noncovered expenses are not kept in the database, but can be reliably inferred from information about the coverage or through patient elicitation. There is at least one successful variant of such a design. The Surveillance, Epidemiology, and End Results (SEER)/Medicare database maintained by the National Cancer Institute consists of Medicare billing records linked to tumor registry information for cancer patients registered in the SEER database.
A Medicare-based cohort study of the direct and indirect costs of cancer has several advantages. The sample frame is easily constructed using Medicare claims and enrollment data (either with or without registrymatching). The alternative approach in a large probability sample is unclear but would certainly entail much more expense. Drawing on the HIV example, an HIV Cost and Services Utilization Study-like sample would involve multistage sampling and negotiation with hundreds of providers to assemble lists of cancer patients. Significant provider or registry nonresponse has the potential to jeopardize a study before it begins.
A Medicare-based cohort would also have claims data readily available in one format, with algorithms for cleaning and processing that have already been developed. It would not require imputations of the sort needed for the HIV study. In addition, cost information could be obtained on deceased patients and nonrespondents, unlike population-based sample designs. These populations are often missed in probability samples.
Many of the concerns about using a Medicare-based approach can be overcome. SEER is not geographically representative of the entire United States, but an augmented sample could be drawn using registries outside SEER. Information on care not covered by Medicare-most importantly, prescription drugs-could be collected through patient survey and review of medical records, as in the Cost of Cancer Treatment Study.[2] The periodic survey would also be used to elicit indirect costs.
The most serious concern is that a Medicare-based cohort study would only be representative of elderly patients with cancer. This limitation would need to be balanced against the readily accessible sample frame and more reliable cost data for respondents and nonrespondents. Ultimately though, if a national probability sample is attempted, it is hoped that the data collected will be of sufficient quality to allow the investigation of a host of pressing issues.
Financial Disclosure: The author has no significant financial interest or other relationship with the manufacturers of any products or providers of any service mentioned in this article.
1.
The Health and Retirement Study: A LongitudinalStudy of Health, Retirement, and Aging.Sponsored by the National Institute on Aging.Available at http://hrsonline.isr.umich.edu.Accessed December 5, 2002.
2.
Goldman D, Schoenbaum ML, Potosky A,et al: Measuring the incremental cost of clinicalcancer research. J Clin Oncol 19:105-110, 2001.