Centers and Institutes

Sources of Secondary Data

The Indiana University School of Medicine Department of Surgery’s Center for Outcomes Research in Surgery (CORES) provides support in acquiring secondary data sources that can be used independently or linked with other databases.

Data source inventory

The National Inpatient Sample developed for the Healthcare Cost and Utilization Project is the publicly available longitudinal all-payer inpatient health care database, generating national estimates of health care utilization, access, quality, and outcomes from the hospital inpatient stays. Since 2012, the NIS was redesigned to collect information from approximately 20-percent stratified samples of all discharges from U.S. community hospitals, excluding rehabilitation and long-term acute care hospitals.

The Metabolic and Bariatric Surgery Accreditation and Quality Improvement Program, a combined program of the American College of Surgeons and the American Society for Metabolic and Bariatric Surgery, collects outcome data from all accredited bariatric surgical centers. The participant use data file (PUF) collects patient-level’s aggregate data and provides researchers at participating sites with a data resource that can be used to examine the quality of care delivered to the metabolic and bariatric surgical patients. The PUF is provided at no additional cost to employees of MBSAQIP participant centers.

American College of Surgeons’ National Surgical Quality Improvement Program is the nationally validated, risk-adjusted, outcomes-based program that assesses and aims to improve the quality of surgical care in the private sector. NSQIP tracks patients for 30 days after operation to monitor preventable complications as a part of the quality of surgical care. NSQIP collects data on perioperative procedures and outcomes. ACS NSQIP helps hundreds of hospitals across the country evaluate the quality of their surgical programs and make informed decisions on improving surgical outcomes.

The National Cancer Database is a nationally recognized oncology database that collects clinical and hospital based information from more than 1,500 Commission on Cancer (CoC)-accredited facilities. NCDB is co-sponsored by the American College of Surgeons and the American Cancer Society. NCDB data tracks the patients with malignant neoplastic diseases, their treatments, and outcomes from more than 70 percent of newly diagnosed cancer cases nationwide. The database includes more information on types of most commonly reported cancers, participant user files, and public use data.

The National Trauma Data Bank, created in 1989 by a collaborative group of American College of Surgeons Committee on Trauma, is the national aggregation of the U.S. trauma registry data collected centrally from more than 600 verified trauma centers and hospitals throughout the U.S. and Puerto Rico. It contains more than 2 million records of data. Lists of variables and their standardized operational definitions can be obtained from NTDB, along with more information and steps to acquire data.

In 2008, the Oregon Health Plan, through Oregon’s Medicaid initiative program, was a randomized controlled trial that expanded Medicaid to cover all uninsured residents up to 100 percent of the federal poverty level. The eligible residents were randomly selected from a pool of uninsured individuals to receive an OHP application package. The study then followed up with those who were selected and those who were not selected by the lottery for six months and one year post baseline. Variables that were collected, but not limited to, include insurance coverage patterns, access to care, health care utilization, financial stress, medical debt, and self-reported physical and mental health status.

The Quality of Life study was a prospective study that followed traumatically injured patients from the Presley Memorial Trauma Center (PMTC) in Memphis for 12 months. The study used standardized instruments to collect information on alcohol abuse (AUDIT-10), Charlson’s comorbidity index, Center for Epidemiologic Studies Depression (CESD-20), Drug Abuse Screening Test (DAST-20), Functional Independence Measure (FIM-18), Health and Retirement Study (HRSRS-51), Multidimensional Scale of Perceived Social Support Survey (MSPSS-12), body-parts specific pain ratings, PTSD-Civilian Version (PCLC-17), and SF-36. For more detailed information on the variable list and the data contact CORES.

Since 1957, the National Health Interview Survey has collected data from a statistically representative sample of the U.S. civilian noninstitutionalized population on several health topics through in-person household interviews in the U.S. on the scope, distribution, and effects of illness and disability and the available services. The survey results are used by the public health research community for epidemiologic and policy analysis of various health problems, determining barriers to health care access, monitoring the federal health programs to track progress made toward achieving national health objectives.

The Behavioral Risk Factor Surveillance System is the largest national health survey collecting data from more than 400,000 adults annually in the U.S. regarding their health risk behaviors, chronic health conditions, and use of preventive services. Prevalence of health conditions and risk behaviors, trends and local area estimates can be made using BRFSS data over time.

The National Health and Nutrition Examination Survey from the National Center for Health Statistics has been used to assess the health and nutritional status of adults and children in the United States since the early 1960s. Since 1999, the survey annually interviews and collects physical examination information from a nationally representative sample of about 5,000 people in the U.S. The NHANES interview includes demographic, socioeconomic, dietary, and health-related questions. The examination component consists of medical, dental, and physiological measurements, as well as laboratory tests.

The National Survey on Drug Use and Health, sponsored by the Substance Abuse and Mental Health Services Administration, uses national and state-level data to examine the use of tobacco, alcohol, illicit drugs (including non-medical use of prescription drugs) and mental health status in the United States. The NSDUH is an annual nationwide survey used since 1979 involving interviews with approximately 70,000 randomly selected individuals aged 12 and older.

The National Youth Tobacco Survey provides comprehensive national data on long-term, intermediate, and short-term indicators for the design, implementation, and monitoring of the tobacco control programs. NYTS data will help to compare the state level youth tobacco uses measures with that of the nation. The NYTS provides nationally representative data about middle and high school youth’s on:

  • Tobacco-related beliefs
  • Attitudes
  • Behaviors
  • Exposure to pro- and anti-tobacco influences

Data and detail documentation related to the survey is available for the years 1999-2017.

The National Adult Tobacco Survey helps to examine the prevalence of tobacco use and factors associated with tobacco use among adults. The data is representative and comparable at both national and state levels. The NATS questionnaire focuses on the Office of Smoking and Health’s Key Outcome Indicators around the four goal areas:

  • Preventing initiation of tobacco use
  • Eliminating nonsmokers’ exposure to secondhand smoke
  • Promoting quitting among adults and young people
  • Identifying and eliminating tobacco-related disparities

Data and detail documentation related to the survey is available for the years 2009-2010, 2012-2013, and 2013-2014.

The Youth Risk Behavior Surveillance System, a national school-based survey conducted every two years by CDC and state, territorial, and local education and health agencies and tribal governments, has been collecting data since 1991 from high school students to monitor prevalence of obesity and asthma including health risk behaviors among youth:

  • Behaviors that contribute to unintentional injuries and violence
  • Sexual behaviors related to unintended pregnancy and sexually transmitted diseases, including HIV infection
  • Alcohol and other drug use
  • Tobacco use
  • Unhealthy dietary behaviors
  • Inadequate physical activity

The Medical Expenditure Panel Survey is a set of large-scale surveys of families and individuals, their medical providers, and employers across the U.S. MEPS covers data on the health care cost and utilization, and the health insurance coverage. MEPS mainly includes household and insurance components. Household component provides demographic, health status and health care utilization, access to care, satisfaction with care, income, and employment and related information about the families and the individuals in a household. The insurance component collects data from a sample of private and public sector employers on the health insurance plans they offer their employees. MEPS provides information on medical provider component which includes information from the hospitals, physicians, home health care providers and pharmacies identified by MEPS-HC respondents.

A National Cancer Institute sponsored Tobacco Use Supplement to the Current Population Survey is a survey of tobacco use administered every three to four years since 1992-93 as a supplement of the U.S. Census Bureau’s current population survey. These data can be used by researchers to:

  • Monitor the progress in tobacco control activities
  • Conduct tobacco-related research
  • Evaluate tobacco control programs

The National Cancer Institute’s Surveillance, Epidemiology, and End Results Program provides information on cancer incidence and survival in the U.S. from 19 population-based cancer registries. SEER registry covers 28 percent of the U.S. population and collects information on patient demographics, primary tumor site, tumor morphology and stage at diagnosis, first course of treatment, and follow-up for vital status. A signed research data agreement is required to access these data. Currently SEER has data that ranges from 1973-2014.

The National Association of County and City Health Officials tracks trends of the local health department infrastructure and population health activities over time. The National Profile of Local Health Department is the largest, most reliable source of data on local health departments and collects information on infrastructure, workforce, finance, governance, activities and services. The Forces of Change surveys from NACCHO assess the impact of a variety of trends affecting change in local health departments, including health reform, economic factors, and accreditation.

The Compressed Mortality data include county-level national deaths and population counts for all U.S. counties since 1968. Counts and rates of death can be obtained by underlying cause of death, state, county, age, race, sex and year. ICD codes are used to define the underlying cause of deaths.

The Area Health Resources Files data includes information on health care professionals, health facilities, population characteristics, economics, health professional training, hospital utilization, hospital expenditures and environment at the county, state and national levels in the U.S. The AHRF data are obtained from more than 50 sources.

HRSA’s Bureau of Health Workforce has conducted a sample survey of registered nurses approximately every four years since 1977 to examine the trends and projection of the future supply of nursing resources. The data is available at the county and the state levels.

The American Hospital Association has collected proprietary hospital and health system data since 1946. The data includes more than 1,000 data points provided by more than 6,300 hospitals and 400 health care systems in the U.S. The database captures hospital demographics and characteristics, hospital organization, staffing, leadership, and many more variables required for health care and health services analysis. The AHA data can also be matched with other sources of data to expand the analysis that includes hospital characteristics as one of the variables of interest.