Skip to content
Hantaflow
Technical

Methodology

The pipeline from authoritative source to live map pin. Designed to be auditable: every signal is sourced, every source is listed, and the code is open.

Pipeline overview

  1. Ingest. A scheduled worker (node-cron, in-container) pulls each configured source on its declared cadence — weekly for CDC NNDSS, every 15 minutes for Google News and GDELT, etc.
  2. Normalise. Each ingestor produces a uniform Signal shape: id, source, sourceCode, category, rank, title, summary, url, language, countryIso2, publishedAt, ingestedAt.
  3. Resolve. News URLs are de-redirected (e.g. resolving Google News redirector links to canonical publisher URLs); language is detected; country is inferred from the source-feed metadata or text.
  4. De-duplicate. Signals are keyed by canonical-URL hash; duplicates are merged with first-seen publishedAt retained.
  5. Classify. Signals are tagged by category (official / news / surveillance / advisory) and rank (1 / 2 / 3) based on source.
  6. Snapshot. A single JSON snapshot is written to a Docker volume (runtime-data/snapshot.json). The Astro server reads this file on each API request and caches with short TTL.

Country attribution

A signal is attributed to a country when:

We deliberately do not attempt fine-grained named-entity recognition for country attribution from free text. The risk of misattribution outweighs the coverage gain.

Country level classification

The pin colour on the map encodes a level, not a count:

Levels are assigned by deterministic rules from the categorised signals; we do not use ML for this classification.

Freshness

What we don't do

Open data

The complete signal feed is available at /api/signals.json under CC BY 4.0. Country-level summary at /api/countries.json. Source health at /api/health.json. RSS at /feed.xml.