Supplementary Materials

Enhancing Temporo-Geospatial Epidemiological Analysis of Avian Influenza using Phylogeography

by Dhananjai M. Rao



Intercontinental migratory waterfowl are the primary vectors for dispersion of H5N1 viruses and have been implicated in several zoonotic epidemics and pandemics. Recent investigations have established that with a single mutation, the virus gains the ability to transmit between humans. Consequently, there is a hightened urgency to identify innovative approaches to proactively mitigate emergent epidemics. Accordingly, a novel methodology combining temporo-geospatial epidemiology and phylogeographic analysis of viral strains is proposed to identify critical epicenters and epidemic pathways along with high risk candidate regions for increased surveillance.


Infection transmission pathways between 22 high risk waterfowl species obtained from epidemiological analysis. Dominant pathways were identified through annotation using phylogeographical data computed from the phylogram of 2417 H5N1 HA isolates (from GISAID EpiFlu database). Annotation of infection pathways in turn delinated 23 influential clades out of 130 clades in the phylogram.


The phylogeographic analyses provides strong cross-validation of epidemic pathways and identifies the dominant pathways for use in other epidemiological and prophylactic studies. The temporo-geospatial characteristics of infection transmission provides corroborating, but novel evidence for rapid genesis of H5N1 lineages in S.E. Asia. The proposed method pinpoints several regions, particularly in the southern hemisphere, as candidates for increased surveillance.

High Risk Species used for Analysis

An automatic model generation methodology that transforms Geographic Information System (GIS) datasets into SEARUMS model has been used to generate a model consisting of the following 22 high risk waterfowl species from which H5N1 viral strains have been isolated in the past:

Species NamePopulationFlocks/AgentsModel Image
Aix sponsa350000064Asponsa.png
Amazonetta brasiliensis110000103Abrasili.png
Anas acuta 5300000372Aacuta.png
Anas bahamensis 64000078Abahamen.png
Anas platalea 50000047Aplatela.png
Anas platyrhynchos 19000000557Aplathyr.png
Anas sibilatrix25000030Asibilat.png
Anas versicolor 12600042Aversico.png
Anser anser 10004Aanser.png
Anser indicus 5600011Aindicus.png
Aythya ferina 2200000213Aferina.png
Aythya fuligula2600000148Afuligul.png
Aythya marila 1200000114Amarila.png
Branta canadensis 5500000169Bcanaden.png
Callonetta leucophrys 500003Cleucoph.png
Cygnus melanocoryphus 5000032Cmelanoc.png
Melanitta nigra 210000096Mnigra.png
Mergellus albellus 13000071Malbellu.png
Netta peposaca 100000026Npeoposac.png
Anas Crecca 5900000403Acrecca.png
Philomachus pugnax 4,200,000210Aplatyrh.png
Porzana pusilla 21300262Ppusilla.png

Model video

A video containing images of the model of the 22 bird species in the above table (with 1 second per species) is available here: model video. The GIS data for the various species were manually obtained from GROMS database while migratory characteristics were obtained from BirdLife International

Phase #1: Epidemiological Analysis Dataset

The XML model description for various species were merged together to obtain the final characteristic model shown in the figure below. The various species in the model have been color coded to distinguish them. However, migratory pathways are not shown to improve clarity. The initial starting zone was set to correspond to the wintering zone. The positions of the flocks in summering and wintering zones are shown in the images below:

Wintering zone (click for full size image)
Summering zone (click for full size image)

The epidemiological analysis to determine infection spread patterns was conducted by seeding a single infection in one waterfowl flock in Guangdong, China (at 23° 21' 36.53" N 113° 36' 25.89" E). The model was configured to have a basic reproductive number for the infection (R0) to be greater than 1 to reflect enzootic nature of the infection. Furthermore, the disease transmission parameters were configured to reflect a Susceptible → Infected (SI) type compartmental epidemiological model. The model was simulated for a period of three years while logging the locations of various infections occurring in the model. A video illustrating the spread of infection is available via the following URL: Video of spread of infection

Epidemiological analysis and Infection Graph

The dominant infection locations observed in several objects are shown in the second figure below. The locations the figure are color coded to reflect the number of intermediate hosts to the source infection in Guangdong, China (at 23° 21'36.53"N, 113° 36'25.89"E), corresponding to the root (A/goose/Guangdong/1996) of the revised H5N1 nomenclature phylogram. The locations of various infection transmissions indicate potential areas for secondary outbreaks and increased density of outbreaks in turn increase the probability of human outbreaks. The high risk areas as reported by WHO are highlighted in bright orange in the second figure. The figures highlights the overlap between dense outbreaks identified by epidemiological analysis and the regions with observed human cases reported by WHO.
High risk areas with human cases as reported by WHO (click for full size image)
Infection graph (edges not shown in small version). Click image for full size image of infection graph with all edges shown



  • A video containing images of the model of the 22 bird species used in this study (with 1 second per species) is available here: model video.
  • A video illustrating the spread of infection is available via the following URL: Video of spread of infection.

Phase #2: Phylogeographic Analysis Datasets

The phylogram generation procedure adopted by WHO/OIE/FAO H5N1 Evolution Working Group has been utilized to generate a phylogenetic tree using 2,417 H5N1-Hemmagglutinin (HA) segments. The viral strains were obtained from GISAID EpiFlu database by restricting the search to reads longer than 1600 nucleotides (nt), which corresponds to 90% of the open frame read length. Furthermore, the EpiFlu search query was restricted to a 3 year time period from 2006 to 2009 (inclusive) corresponding to the 3 year period used for epidemiological analysis in Phase 1. A multiple sequence alignment of the 2,417 H5N1-HA sequences was generated using MUSCLE (version 3.7) using 16 iterations. A large unrooted neighbor-joining tree of the 2,417 H5N1 HA strains was constructed using a GTR+I+Γ model in PAUP* v4.0b10. The newick form of the phylogram generated by PAUP* was used to categorize leaves into clades such that percentage pairwise nucleotide distances between and within clades are > 1.5% and < 1.5% respectively, concordant with WHO/OIE/FAO clade definition criteria.

The primary dataset of H5N1 strains used for analysis along with various resulting files are listed below:

Results (Phase #3): Phylogeographically annotated Epidemiology Datasets

The last phase of the proposed methodology utilizes the phylogeographic data extracted in Phase 2 to identify and annotate strong infection pathways in the infection graph generated in the Phase 1. In this phase, the latitude and longitude values for each pair of verities constituting an edge in the infection graph are reverse geocoded to identify correspondence with phylogeographic data. Geocoding is performed using the GeoNames database in multiple passes using increasing radius (1/50, 1/25, and 1 mile) of matching to identify higher resolution regional-level geocodes. The resulting phylogeographically annotated infection graphs for summering and wintering zones are shown below (click on images for full sized images):

Summering Zone (click for full size image)
Wintering Zone (click for full size image)

The infection graph corresponding to summering and wintering zones generated by annotating edges (infection pathways) using phylogeographic annotations are shown above. Note that these are the regions were the flocks predominantly roost and are primary locations for cross species infections. A significant variation in the number of edges annotated by influential clades between summering and wintering seasons was observed. The number of influential clades increases in wintering zone because the birds migrate down south and spread out across many countries. This time frame also coincides with increase in influenza epidemics in the temperate and subtropical regions thereby increasing the potential for reassortment between human and avian influenza viruses leading to emergent of novel and possibly highly virulent strains that cause mortality in humans. These inferences are consistent with prior investigations reported by various researchers and multinational surveillance organizations. However, the significant evidence correlating migratory patterns of high risk waterfowl species to the ``breeding grounds'' of novel H5N1 strains using phylogeographic-epidemiological analysis is an original and unique inference from this research. in addition to providing a complementary perspective on the ecological aspects of avian influenza, the inferences increase confidence in the proposed methodology.