Synthetic Data FAQ

This document attempts to answer frequently asked questions regarding synthetic data released in 2021 for the following APIs that provide access to Medicare claims data:

Please see the Synthetic Data Guide for more information.

How do I access the synthetic data? How are the data useful to my organization? What exactly do the data contain?

See the Synthetic Data Guide.

What will happen to the other types of synthetic data? Is it being replaced?

The previously released sets of synthetic data will persist indefinitely; they are not being replaced. This new set of synthetic data will instead provide additional synthetic beneficiaries. Either or both sets of synthetic data can be used going forward.

What are the differences between this most recent set of synthetic data and the old set?

See the Release History section of our Guide

How often are the data updated?

That is still TBD. However, the current goal is to continue enhancing the way we generate synthetic data and regularly add new sets of synthetic beneficiaries and claims, to take advantage of those enhancements.

Are field values in the synthetic data consistent with real values and expected formats?

Yes, they are consistent with expected formats and we're working hard to ensure that they are consistent with real records and values, as much as is reasonable.

Does the synthetic data contain at least one record with each possible code for all coded values?

Not yet, though we plan to have one in future releases.

Does the synthetic data contain all fields? Meaning if I write a parser for the ndjson files produced by the sandbox environment and that parser works, will my parser also work in production?

Not yet, though we're planning to increase coverage of fields in future releases. As of August, 2021, the synthetic data includes the following maximum percentages of possible fields (any given record may contain less):

Field type	Maximum percentage covered
Beneficiary fields	27%
Beneficiary History fields	70%
Inpatient fields	83%
Outpatient fields	75%
Carrier fields	79%
Prescription fields	93%
DME fields	84%
HHA fields	63%
Hospice fields	66%
SNF fields	73%

Do the data types for fields in the synthetic data always match the data types in the production data (string, bool, integer)?

Yes, they should.

Does the size distribution of each EOB in synthetic data generally match the size of production EOBs?

TBD (though the answer is likely no for the time being)

Where can I ask questions not answered here?

Join the Google Groups for any APIs you access and ask there:

Blue Button 2.0 (BB2.0): https://groups.google.com/g/developer-group-for-cms-blue-button-api
Beneficiary Claims Data API (BCDA): https://groups.google.com/forum/#!forum/bc-api
Data at the Point of Care (DPC): https://groups.google.com/forum/#!forum/dpc-api
Medicare Claims Data to Part D Sponsors (AB2D): https://groups.google.com/g/ab2d-api

Home
For BFD Users
- Making Requests to BFD
- API Changelog
- Migrating to V2 FAQ
- Synthetic and Synthea Data
  - Synthetic Data Guide
  - Synthetic Data FAQ
- BFD SAMHSA Filtering

For BFD Contributors and Maintainers

Synthetic Data FAQ

How do I access the synthetic data? How are the data useful to my organization? What exactly do the data contain?

What will happen to the other types of synthetic data? Is it being replaced?

What are the differences between this most recent set of synthetic data and the old set?

How often are the data updated?

Are field values in the synthetic data consistent with real values and expected formats?

Does the synthetic data contain at least one record with each possible code for all coded values?

Does the synthetic data contain all fields? Meaning if I write a parser for the ndjson files produced by the sandbox environment and that parser works, will my parser also work in production?

Do the data types for fields in the synthetic data always match the data types in the production data (string, bool, integer)?

Does the size distribution of each EOB in synthetic data generally match the size of production EOBs?

Where can I ask questions not answered here?

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally