NOAA ISD Weather Station

Download, Parse, Visualize Integrated Surface Dataset.

If you have a database and don’t know what to do with it, check out another open source project by the author: Vonng/isd

You can directly reuse the monitoring system Grafana to interactively access sub-hourly weather data from nearly 30,000 surface weather stations over the past 120 years.

ISD – Intergrated Surface Data

All the tools you need to download, parse, process, and visualize NOAA ISD datasets are included here. It gives you access to sub-hourly weather data from nearly 30,000 surface weather stations over the last 120 years. And experience the power of PostgreSQL for data analysis and processing!

SYNOPSIS

Download, Parse, Visualize Integrated Surface Dataset.

Including 30000 meteorology station, sub-hourly observation records, from 1900-2020.

ISD Dataset - 图1

Quick Started

  1. Clone repo

    1. git clone https://github.com/Vonng/isd && cd isd
  2. Prepare a postgres database

    Connect via something like isd or postgres://user:pass@host/dbname)

    1. # skip this if you already have a viable database
    2. PGURL=postgres
    3. psql ${PGURL} -c 'CREATE DATABASE isd;'
    4. # database connection string, something like `isd` or `postgres://user:pass@host/dbname`
    5. PGURL='isd'
    6. psql ${PGURL} -AXtwc 'CREATE EXTENSION postgis;'
    7. # create tables, partitions, functions
    8. psql ${PGURL} -AXtwf 'sql/schema.sql'
  3. Download data

    • ISD Station: Station metadata, id, name, location, country, etc…
    • ISD History: Station observation records: observation count per month
    • ISD Hourly: Yearly archived station (sub-)hourly observation records
    • ISD Daily: Yearly archvied station daily aggregated summary
    1. git clone https://github.com/Vonng/isd && cd isd
    2. bin/get-isd-station.sh # download isd station from noaa (proxy makes it faster)
    3. bin/get-isd-history.sh # download isd history observation from noaa
    4. bin/get-isd-hourly.sh <year> # download isd hourly data (yearly tarball 1901-2020)
    5. bin/get-isd-daily.sh <year> # download isd daily data (yearly tarball 1929-2020)
  4. Build Parser

    There are two ISD dataset parsers written in Golang : isdh for isd hourly dataset and isdd for isd daily dataset.

    make isdh and make isdd will build it and copy to bin. These parsers are required for loading data into database.

    You can download pre-compiled binary to bin/ dir to skip this phase.

  5. Load data

    Metadata includes world_fences, china_fences, isd_elements, isd_mwcode, isd_station, isd_history. These are gzipped csv file lies in data/meta/. world_fences, china_fences, isd_elements, isd_mwcode are constant dict table. But isd_station and isd_history are frequently updated. You’ll have to download it from noaa before loading it.

    1. # load metadata: fences, dicts, station, history,...
    2. bin/load-meta.sh
    3. # load a year's daily data to database
    4. bin/load-isd-daily <year>
    5. # load a year's hourly data to database
    6. bin/laod-isd-hourly <year>

    Note that the original isd_daily dataset has some un-cleansed data, refer caveat for detail.

Data

Dataset

数据集样本文档备注
ISD Hourlyisd-hourly-sample.csvisd-hourly-document.pdf(Sub-) Hour oberservation records
ISD Dailyisd-daily-sample.csvisd-daily-format.txtDaily summary
ISD MonthlyN/Aisd-gsom-document.pdfNot used, gen from daily
ISD YearlyN/Aisd-gsoy-document.pdfNot used, gen from monthly

Hourly Data: Oringinal tarball size 105GB, Table size 1TB (+600GB Indexes).

Daily Data: Oringinal tarball size 3.2GB, table size 24 GB

It is recommended to have 2TB storage for a full installation, and at least 40GB for daily data only installation.

Schema

Data schema definition

Station

  1. CREATE TABLE public.isd_station
  2. (
  3. station VARCHAR(12) PRIMARY KEY,
  4. usaf VARCHAR(6) GENERATED ALWAYS AS (substring(station, 1, 6)) STORED,
  5. wban VARCHAR(5) GENERATED ALWAYS AS (substring(station, 7, 5)) STORED,
  6. name VARCHAR(32),
  7. country VARCHAR(2),
  8. province VARCHAR(2),
  9. icao VARCHAR(4),
  10. location GEOMETRY(POINT),
  11. longitude NUMERIC GENERATED ALWAYS AS (Round(ST_X(location)::NUMERIC, 6)) STORED,
  12. latitude NUMERIC GENERATED ALWAYS AS (Round(ST_Y(location)::NUMERIC, 6)) STORED,
  13. elevation NUMERIC,
  14. period daterange,
  15. begin_date DATE GENERATED ALWAYS AS (lower(period)) STORED,
  16. end_date DATE GENERATED ALWAYS AS (upper(period)) STORED
  17. );

Hourly Data

  1. CREATE TABLE public.isd_hourly
  2. (
  3. station VARCHAR(11) NOT NULL,
  4. ts TIMESTAMP NOT NULL,
  5. temp NUMERIC(3, 1),
  6. dewp NUMERIC(3, 1),
  7. slp NUMERIC(5, 1),
  8. stp NUMERIC(5, 1),
  9. vis NUMERIC(6),
  10. wd_angle NUMERIC(3),
  11. wd_speed NUMERIC(4, 1),
  12. wd_gust NUMERIC(4, 1),
  13. wd_code VARCHAR(1),
  14. cld_height NUMERIC(5),
  15. cld_code VARCHAR(2),
  16. sndp NUMERIC(5, 1),
  17. prcp NUMERIC(5, 1),
  18. prcp_hour NUMERIC(2),
  19. prcp_code VARCHAR(1),
  20. mw_code VARCHAR(2),
  21. aw_code VARCHAR(2),
  22. pw_code VARCHAR(1),
  23. pw_hour NUMERIC(2),
  24. data JSONB
  25. ) PARTITION BY RANGE (ts);

Daily Data

  1. CREATE TABLE public.isd_daily
  2. (
  3. station VARCHAR(12) NOT NULL,
  4. ts DATE NOT NULL,
  5. temp_mean NUMERIC(3, 1),
  6. temp_min NUMERIC(3, 1),
  7. temp_max NUMERIC(3, 1),
  8. dewp_mean NUMERIC(3, 1),
  9. slp_mean NUMERIC(5, 1),
  10. stp_mean NUMERIC(5, 1),
  11. vis_mean NUMERIC(6),
  12. wdsp_mean NUMERIC(4, 1),
  13. wdsp_max NUMERIC(4, 1),
  14. gust NUMERIC(4, 1),
  15. prcp_mean NUMERIC(5, 1),
  16. prcp NUMERIC(5, 1),
  17. sndp NuMERIC(5, 1),
  18. is_foggy BOOLEAN,
  19. is_rainy BOOLEAN,
  20. is_snowy BOOLEAN,
  21. is_hail BOOLEAN,
  22. is_thunder BOOLEAN,
  23. is_tornado BOOLEAN,
  24. temp_count SMALLINT,
  25. dewp_count SMALLINT,
  26. slp_count SMALLINT,
  27. stp_count SMALLINT,
  28. wdsp_count SMALLINT,
  29. visib_count SMALLINT,
  30. temp_min_f BOOLEAN,
  31. temp_max_f BOOLEAN,
  32. prcp_flag CHAR,
  33. PRIMARY KEY (ts, station)
  34. ) PARTITION BY RANGE (ts);

Update

ISD Daily and ISD hourly dataset will rolling update each day. Run following scripts to load latest data into database.

  1. # download, clean, reload latest hourly dataset
  2. bin/get-isd-daily.sh
  3. bin/load-isd-daily.sh
  4. # download, clean, reload latest daily dataset
  5. bin/get-isd-daily.sh
  6. bin/load-isd-daily.sh
  7. # recalculate latest partition of monthly and yearly
  8. bin/refresh-latest.sh

Parser

There are two parser: isdd and isdh, which takes noaa original yearly tarball as input, generate CSV as output (which could be directly consume by PostgreSQL Copy command).

  1. NAME
  2. isdh -- Intergrated Surface Dataset Hourly Parser
  3. SYNOPSIS
  4. isdh [-i <input|stdin>] [-o <output|st>] -p -d -c -v
  5. DESCRIPTION
  6. The isdh program takes isd hourly (yearly tarball file) as input.
  7. And generate csv format as output
  8. OPTIONS
  9. -i <input> input file, stdin by default
  10. -o <output> output file, stdout by default
  11. -p <profpath> pprof file path (disable by default)
  12. -v verbose progress report
  13. -d de-duplicate rows (raw, ts-first, hour-first)
  14. -c add comma separated extra columns

UI

ISD Station

ISD Dataset - 图2

ISD Monthly

ISD Dataset - 图3

Last modified 2022-06-03: add scaffold for en docs (6a6eded)