What is a "signal"?
A signal is one credible mention of hantavirus activity from one of our vetted sources. It is the unit of data on the map, in the stats bar, in the per-country pages, and in the JSON API. Each signal has a source, a country, a language, a timestamp and a link to the original publication.
What a signal is depends on the source:
- US (CDC NNDSS): actual confirmed weekly case counts
from the National Notifiable Diseases Surveillance System. Pulled
from the public Socrata API at
data.cdc.gov. State-by-state breakdown, MMWR week dates. These numbers are confirmed cases, not mentions. - WHO Disease Outbreak News, ECDC, PAHO: official outbreak announcements and risk assessments when published. Low frequency, high authority.
- News (Google News in 17 languages, GDELT, ProMED): mentions, not case counts. Includes regional public-health press releases that frequently appear in local-language news before WHO or ECDC update their dashboards. Use as a leading indicator.
Signals are not confirmed case counts outside the US. We deliberately do not estimate, extrapolate, or sum signals to imply cases. The unit is "vetted mention in the last 30 days," not "patient diagnosed." When confirmed counts are available from official surveillance feeds (currently US only via NNDSS), they are clearly labelled with the CDC NNDSS source tag.
Country-level surveillance feeds for the rest of the world (RKI in Germany, Rospotrebnadzor in Russia, KDCA in South Korea, etc.) are generally not available as machine-readable APIs. RKI publishes a weekly Wochenbericht as a PDF; we link it as a citation on country pages but do not currently ingest it. Adding country-specific surveillance scrapers is on the roadmap.
Pipeline overview
- Ingest. A scheduled worker (node-cron, in-container) pulls each configured source on its declared cadence: weekly for CDC NNDSS, every 15 minutes for Google News and GDELT, etc.
- Normalise. Each ingestor produces a uniform
Signalshape: id, source, sourceCode, category, rank, title, summary, url, language, countryIso2, publishedAt, ingestedAt. - Resolve. News URLs are de-redirected (e.g. resolving Google News redirector links to canonical publisher URLs); language is detected; country is inferred from the source-feed metadata or text.
- De-duplicate. Signals are keyed by canonical-URL hash; duplicates are merged with first-seen
publishedAtretained. - Classify. Signals are tagged by category (official / news / surveillance / advisory) and rank (1 / 2 / 3) based on source.
- Snapshot. A single JSON snapshot is written to a Docker volume (
runtime-data/snapshot.json). The Astro server reads this file on each API request and caches with short TTL.
Country attribution
A signal is attributed to a country using a three-tier hierarchy. This is the same pattern used by serious news-based surveillance systems (HealthMap, ProMED, GPHIN, EIOS): read the article, not the publisher's home address.
- Tier 1, source-authoritative. CDC NNDSS, WHO Disease
Outbreak News, ECDC and PAHO publish structured country fields with
their data. These are trusted as-is and tagged
attributionMethod: "source-authoritative". - Tier 2, content match. For news articles, we scan
the title (highest weight) and summary against a multilingual
country-name gazetteer derived from Unicode CLDR
(via
i18n-iso-countries) plus stem overrides for inflected languages (Russian, Polish, Greek, Turkish). Matches use word boundaries and longest-name-first ordering so "Georgia, United States" pins US, not Georgia. Strain and virus names that contain place words (Sin Nombre, Andes, Seoul, Hantaan, Puumala, Dobrava, Choclo, Laguna Negra) are stripped before matching. TaggedattributionMethod: "title-match"or"summary-match". - Tier 3, unattributed. If no country name appears in
title or summary, the signal is kept in the global and per-language
feeds but contributes to no country's pin. Tagged
attributionMethod: "unattributed". The feed's geo-target is never used as a primary attribution. A Portuguese article from Portugal's Google News feed that talks about an Argentine outbreak attributes to Argentina, not Portugal.
A single article can attribute to multiple countries when it explicitly mentions several (e.g. "Argentina and Chile outbreaks" emits two signals, one per country). Each per-country signal carries the same source, title and URL.
Known limitations. News-based attribution caps out at
roughly 80% accuracy in published evaluations (HealthMap, Freifeld et
al. 2008). Very small countries the gazetteer omits, and articles
written about cities or regions without naming the country, are
false negatives. False positives are minimised by the strain-name
stripping, US-state disambiguation for Georgia, and the
content-not-publisher rule. The breakdown by attribution method for
any country is exposed at
/api/countries/<slug>.json under
stats.attribution so consumers can judge data quality.
Country level classification
The pin colour on the map encodes a level, not a count:
- Local: case, death, or active outbreak signal in the country.
- Imported: case present but exposure occurred elsewhere (returnee, repatriation, treatment).
- Response: only travel advisory, screening or quarantine policy signals; no local case.
- Inactive: no signals in the rolling 30-day window.
Levels are assigned by deterministic rules from the categorised signals; we do not use ML for this classification.
Freshness
- Fresh: last successful ingestion ≤ 60 min ago.
- Stale: 60 min – 6 h since last ingestion.
- Unknown: no successful ingestion yet.
What we don't do
- We do not estimate or extrapolate case counts. Numbers shown reflect signal mentions, not confirmed cases. Confirmed cases come from official surveillance feeds and are clearly labelled.
- We do not use generative-AI summaries for headlines. We display the publisher's title verbatim.
- We do not store article body text. We link to the source.
- We do not set tracking cookies, run third-party trackers, or identify individual visitors. We do run self-hosted, cookieless Plausible CE for aggregate page-view counts (see privacy).
Open data
The complete signal feed is available at /api/signals.json
under CC BY 4.0. Country-level summary at /api/countries.json.
Source health at /api/health.json. RSS at
/feed.xml.