Subscribe Now

By entering these details you are signing up to receive our newsletter.

Precision in patient data: How genetic databases are helping to shape rare disease population studies

Left: Portrait of Olivia. Right: Health Lumen logo
Written by Olivia Seifert, content strategist atHealthLumen

Rare diseases are defined as conditions affecting no more than 1 in 2,000 individuals by European standards1 or affecting less than 200,000 individuals at any one time in the United States.2 However, accurately determining the true prevalence of rare diseases—the proportion of the population affected by these conditions at a given point in time—is challenging.

The difficulties of obtaining reliable estimates for many rare diseases, and indeed perhaps the majority of them, can restrict efforts to accelerate the development of therapies, allocate adequate resources, and raise awareness for the rare disease community. Considering that over 90% of rare diseases still have no available treatments,3 generating accurate estimates for the rare disease patient population is a high priority.

With some 80% of rare diseases having an identified genetic basis,4 the availability of large-scale genetic databases is beginning to help turn the tide, offering unprecedented insights into these conditions. By recording the genetic makeup of diverse populations at scale, mining these databases enables the number of individuals carrying disease-associated genetic variants to be calculated with greater accuracy and detail than ever before.

The improved prevalence estimates can then be input into statistical models that project the future trajectory of rare diseases, and the impact that proposed interventions, such as access to critical therapies, would have on patients in need. Generating reliable evidence on the health and economic benefits of such interventions can help support the case of mobilising transformative change for the rare disease community.

The challenge of determining how rare is “rare”

So why is it so difficult to determine accurate prevalence figures for rare disease patient populations?

Individuals with rare diseases often face substantial delays during their quest for a diagnosis—the “diagnostic odyssey”—with many being misdiagnosed or never receiving a diagnosis at all. A significant element of this comes down to rare disease complexity, both genetically and phenotypically (the physical manifestation of the symptoms). As such, medical professionals often do not recognise the symptoms that a rare disease patient is presenting, and may not even be aware of the rare disease at all. Some rare diseases are also late-onset, with individuals living perhaps most of their lives unaware that they have a rare disorder. There are also considerable global disparities in the opportunity for rare disease diagnosis, with limited access to healthcare, the cost of diagnosis, and geographical remoteness all examples of further potential barriers.

As a consequence of these issues, estimating rare disease patient population size when relying purely on reported cases is likely to be inaccurate. The standard protocol for producing rare disease prevalence estimates has historically been to compile such information from systematic literature reviews, data registries, surveillance programs and by obtaining information from rare disease specialists, but all these methods do not account for the fact that many individuals with rare diseases are never diagnosed and, consequently, never get counted. In fact, in recent years, several studies have suggested that initial rare disease patient population figures have likely been underestimated for various conditions.5-8

Closing the gap on rare disease prevalence estimates using genetic databases

Genetic databases offer a unique opportunity to address the challenges of accurately estimating prevalence for rare diseases with a genetic basis. By scanning large genetic datasets that contain the genetic profiles of thousands of individuals, it is possible to count the number of individuals carrying the relevant genetic variants associated with a particular rare disease.

These figures can then be extrapolated to estimate how many individuals in other populations of interest would also be expected to carry a rare disease. When doing this, it is important to account for the fact that different ethnic groups may be more at-risk than others. The typical age of onset of a rare genetic disease of interest, as well as its inheritance modality—for example, whether a disease is dominant or recessive—are also important factors to consider. Additionally, calculations must take into account that not all carriers of a genetic disease variant will necessarily demonstrate symptoms, due to a genetic effect called “incomplete penetrance”.

Accounting for these factors, genetic database analysis offers the opportunity to complement, refine and, in some cases, validate, rare disease patient population estimates that are based solely on the number of reported cases.

Case study: Fabry disease

To illustrate the potential of genetic database analysis, we can summarise a recent example of our research at HealthLumen. Our aim was to better understand the US patient population size of Fabry disease, a rare neurological genetic disorder involving the deficiency of an enzyme called alpha-galactosidase-A, which results in an accumulation of fatty materials known as lipids throughout the body.9 Consequently, harmful levels of lipids build up in the nervous system, and frequently in the eyes, skin, kidneys and heart. Common symptoms include burning pain in the arms and legs, clouding of vision and impaired blood circulation. Fabry disease usually occurs in childhood or adolescence, but can also occur later in life.

Alpha-galactosidase-A enzyme
Alpha-galactosidase-A enzyme

Relying on reported clinical cases has provided broad estimates for the global prevalence of Fabry disease, ranging from 1 in 40,000 to 1 in 170,000 individuals.10–12 However, some newborn screening studies indicated that the prevalence of Fabry disease may be much higher, ranging from 1 in 1,250 to 1 in 21,973 individuals,13–16 although the dried blood spot testing methods used in these studies impacted reliability of results. More recently, a study involving genetic database data estimated that 1 in 5,732 individuals carry genetic variants known to be associated with late-onset Fabry disease, and 1 in 200,643 carry those causing classic Fabry disease.17

Given these widely varying estimates, at HealthLumen we examined the prevalence of Fabry-related genetic variants in the most recent version of the gnomAD genetic database and mapped these findings to the US population structure. We found that 1 out of 6,994 people would be expected to carry a genetic variant for Fabry disease in the US. This provides a robust framework to validate the most recent findings regarding the prevalence of Fabry disease-causing variants, and provides pharmaceutical companies, healthcare organisations and advocacy groups with the confidence that they are equipped with the right patient figures as they investigate how to best support the Fabry community.

For the many rare diseases where no estimation of the patient population size is available at all, mining genetic databases is the most effective solution to closing the data gaps.

Peering into the future

Producing reliable estimates of the current rare disease patient population size is an important step. But it is equally important to understand the future trends—how are these populations likely to evolve in the future, which risk factors will influence these trends, and which interventions will be most effective?

At HealthLumen, we apply powerful computer-based modelling techniques to project the future burden of diseases. Our technology is based on “microsimulation” modelling—which involves creating a “virtual population” that represents the real, changing characteristics of individuals within a population of interest—to quantify how many individuals are expected to be impacted by a rare disease in the future. It can also be used to determine the health and economic benefits of interventions, such as the introduction of a new cell or gene therapy, or earlier screening programs, before real-world implementation. Using this modelling approach, different “what-if” scenarios can be drawn up and compared in order to find the best solutions for addressing the needs of the rare community at hand.

Taking a holistic approach to understanding rare disease patient demographics

Establishing reliable rare disease patient number estimates using genetic database analysis and implementing microsimulation approaches to model into the future are important tools in forming a comprehensive picture of the rare disease patient population. This is critical for informing decisions across the whole of the drug development lifecycle—from determining research and development priorities for cell and gene therapies, through to establishing efficient market access strategies—in addition to mobilising policy change.

For example, modelling the current and future health and economic burden of rare diseases helps pharmaceutical companies better understand the unmet needs of rare disease patients, helping them to identify where therapies are most urgently needed, and where to prioritise their efforts for drug research and development. Robust patient population size estimates also allow pharmaceutical companies to determine the potential market size for rare disease drugs, which can be important in mobilising investors and partners during the drug development process. This may be especially useful where estimates are currently non-existent, or where there has likely been underestimation of the real rare disease patient population size.

Further, with a better understanding of how many people carry a rare disease and whether certain ethnic groups are disproportionately affected, clinical trials can be more efficiently designed to better represent the population who will ultimately benefit from the therapy being developed. Robust estimates of the number of people carrying a rare genetic disease are also an essential input for health technology assessments (HTAs) for prospective therapies when submitting applications to regulatory bodies.

Additionally, it is important that healthcare systems have reliable patient number estimates to be able to allocate adequate resources to the rare community, and accurately projecting what the rare disease landscape may be in the future is important for anticipating future demands.

Armed with accurate patient numbers, and with data-driven evidence of the potential societal and economic benefits of interventions—including newborn screening programmes—policymakers, advocacy organisations and patient groups can more effectively push for the legislative initiatives and public health programmes that the rare community so critically needs. Greater understanding regarding how many people are really affected by rare genetic diseases is also an asset to patient advocacy groups campaigning for better public awareness and education on rare genetic diseases, which are important tools in facilitating earlier disease detection and diagnosis.

For rare genetic disease patients, their families, and society as a whole, tapping into these powerful methodologies could lay the foundations of more inclusive healthcare systems that address the needs of everyone, regardless of the rarity of their condition.

To find out more about the work of HealthLumen, please
Email the HealthLumen team


  1. Council Recommendation of 8 June 2009 on an action in the field of rare diseases
  2. U.S. Government. 107th Congress Public Law 280. Rare Diseases Act of 2002
  3. Rare diseases, common challenges
  4. Rare Genetic Diseases
  5. Primary sclerosing cholangitis in children and adolescents
  6. Newborn screening for Fabry disease in the north-west of Spain
  7. The earlier, the better: Impact of early diagnosis on clinical outcome in idiopathic pulmonary fibrosis
  8. Amyloid heart disease: genetics translated into disease-modifying therapy
  9. Fabry Disease
  10. α-Galactosidase A Deficiency: Fabry Disease
  11. Prevalence of Lysosomal Storage Disorders
  12. The frequency of lysosomal storage diseases in The Netherlands
  13. High Incidence of Later-Onset Fabry Disease Revealed by Newborn Screening
  14. Newborn screening for Fabry disease in the western region of Japan
  15. Newborn Screening for Lysosomal Storage Disorders in Illinois: The Initial 15-Month Experience
  16. Newborn screening for Fabry disease in Taiwan reveals a high incidence of the later‐onset GLA mutation c.936+919G>A (IVS4+919G>A)
  17. Prevalence of Fabry disease-causing variants in the UK Biobank

Editor’s note: To enquire about sponsored thought leadership pieces, please

Skip to content