DWD SQLite database archive¶
About¶
This page contains information about SQLite database files produced by the
phenodata program and its export-observations-all
subcommand.
Prior art¶
Jonas Dierenbach, Franz-W. Badeck, and Jörg Schaber presented PPODB, the Plant-Phenological Online Database, that provides unrestricted and free access to over 16 million plant phenological observations, mostly in Germany.
Its PPODB SQL interface offers online SQL access to the database, running on premises of the Institute for Theoretical Biophysics at the Humboldt University.
In the same spirit, we are producing and publishing SQLite database files about the same matter, for everyone to run database queries on their own hardware.
See also
In order to learn the important details about PPODB, we recommend reading the PPODB summary. More details can be discovered within the scientific paper The plant phenological online database (PPODB) » an online database for long-term phenological data, and the Plant-Phenological Online Database (PPODB) handbook.
Data source¶
Data and metadata is acquired from the DWD CDC Open Data Server, specifically
the observations_germany/phenology
and help
directories.
SQLite database files¶
The data folder contains four SQLite database files. Before downloading and using them, please recognize the DWD data copyright notices referenced below at Data copyright.
phenodata-dwd-annual-historical.sqlite
(1.7 GB)phenodata-dwd-annual-recent.sqlite
(24 MB)phenodata-dwd-immediate-historical.sqlite
(90 MB)phenodata-dwd-immediate-recent.sqlite
(5.5 MB)
Usage¶
Getting started¶
Let’s start by defining the database name, and downloading the file.
export DBPATH=phenodata-dwd-immediate-historical.sqlite
wget --no-clobber "https://phenodata.hiveeyes.org/data/dwd/${DBPATH}"
Inquire the database schema.
# Display all tables and views.
sqlite3 "${DBPATH}" '.tables'
# Display schema of all tables and views.
sqlite3 "${DBPATH}" '.fullschema --indent'
# Display database metadata information.
sqlite3 "${DBPATH}" -csv -header 'SELECT * FROM dwd_about'
The database about historical observations from immediate reporters contains ~250,000 records as of 2023.
sqlite3 "${DBPATH}" 'SELECT COUNT(*) FROM dwd_phenology;'
252378
It spans the time range of observations between 1979 and 2021.
sqlite3 "${DBPATH}" 'SELECT MIN(reference_year), MAX(reference_year) FROM dwd_phenology;'
1979|2021
Run a query on the dwd_phenology
view, with output in CSV format.
sqlite3 -csv -header "${DBPATH}" 'SELECT * FROM dwd_phenology ORDER BY date;'
The same query, but more suitable when aiming to write your query using multiple lines, for example within a program or script file.
sqlite3 -csv -header "${DBPATH}" <<SQL
SELECT * FROM dwd_phenology ORDER BY date;
SQL
PPODB examples¶
Those examples have been taken from the PPODB SQL interface page, and slightly adjusted to use the DWD/Phenology/SQLite database schema.
To invoke those queries, start an interactive shell using sqlite3
:
sqlite3 "${DBPATH}" -csv -header
At first, you usually want to get an overview over the database and list all available tables:
.tables
Often, you want to check whether a certain table contains the information you are interested in. Therefore, you want to have a quick overlook over the columns in the table of interest:
.schema dwd_phase --indent
Stations and observations are uniquely referenced by identifiers. Therefore, it is safer and more efficient to access phenological observations by their identifiers rather than by names. The identifiers of all stations with a name similar to “Geisenheim” can be retrieved with the query:
SELECT station_id, station_full
FROM dwd_phenology
WHERE station_full LIKE '%Geisenheim%';
Within the database, a combination of plant and phase is referenced by a single unique identifier, which is handy:
SELECT *
FROM dwd_phenology
WHERE
species_name_en LIKE '%hazel%' AND
phase_name_en LIKE '%flowering%';
With a station-id and a phase-id, you can efficiently retrieve time series, e.g. flowering of hazel at Geisenheim (DWD):
SELECT day_of_year, reference_year, source, species_name_en
FROM dwd_phenology
WHERE
station_id=19476 AND
species_id=113 AND
phase_id=5
ORDER BY reference_year, day_of_year;
Contrary to PPODB’s recommendation, we think it is acceptable to use human-readable labels for querying. If you will discover this to be a bottleneck for your application, please consider adding additional indexes:
SELECT day_of_year, reference_year, source, species_name_en
FROM dwd_phenology
WHERE
station_full LIKE '%Geisenheim%' AND
species_name_en LIKE '%hazel%' AND
phase_name_en LIKE '%flowering%'
ORDER BY reference_year, day_of_year;
You can also ask more complex questions, e.g. which of the following plants flowered earliest after 1951 on average, hazelnut, chestnut or birch?:
SELECT AVG(day_of_year) mean, reference_year, species_name_en plant, phase_name_en phase
FROM dwd_phenology
WHERE phase_name_en LIKE '%flowering%'
GROUP BY phase_id ORDER BY mean;
or, e.g. how many single station time series are there that have a certain length including all lengths and phases (see Figure 1 in the documentation)?:
SELECT c, COUNT(c) FROM
(
SELECT station_id AS sid, phase_id AS pid, COUNT(DISTINCT reference_year) AS c
FROM dwd_phenology
WHERE phase_id != 0 GROUP BY station_id, phase_id
)
AS sq GROUP BY c;
Specialist’s toolbox¶
At phenological calendar for foraging plants, we are discussing the development of a convenient phenological calendar for beekeepers. Here, we are presenting corresponding database queries suitable for that purpose.
In order to query the database for multiple plants conveniently, there is the
dwd_species_group
table, derived from phenodata’s presets.json file.
The statement below uses the group mellifera-de-primary-openhive
, to list
all observations of “flowering” events for primary foraging plants of honeybees
(apis mellifera), filtering by location on behalf of the synthesized
station_full
field:
SELECT
reference_year,
day_of_year,
source,
species_name_de,
phase_name_de,
station_name
FROM dwd_phenology_group
WHERE true
AND group_name = 'mellifera-de-primary-openhive'
AND phase_name_en LIKE '%flowering%'
AND station_full LIKE '%brandenburg%';
In order to list the available plant group names, query the dwd_species_group
table:
SELECT
dwd_species.*
FROM dwd_species_group, dwd_species
WHERE true
AND dwd_species_group.species_id=dwd_species.id
AND group_name='mellifera-de-primary-openhive';
id,species_name_de,species_name_en,species_name_la
205,Winterraps,"winter oilseed rape","Brassica napus var. napus"
209,Sonnenblume,sunflower,"Helianthus annuus"
215,Mais,maize,"Zea mays"
310,Apfel,apple,"Malus domestica"
320,Birne,pear,"Pyrus communis"
330,"Süßkirsche",cherry,"Prunus avium"
340,Sauerkirsche,morello,"Prunus cerasus"
382,Himbeere,raspberry,"Rubus idaeus"
383,Brombeere,blackberry,"Rubus fructicosus"
113,Hasel,"common hazel","Corylus avellana"
114,Heidekraut,"common heather","Calluna vulgaris"
120,"Löwenzahn",dandelion,"Taraxacum officinale"
121,Robinie,"black locust","Robinia pseudoacacia"
122,Rosskastanie,"horse chestnut","Aesculus hippocastanum"
124,Sal-Weide,"goat willow","Salix caprea"
131,Spitz-Ahorn,"Norway maple","Acer platanoides"
137,Winter-Linde,"small leafed lime","Tilia cordata"
Note
If you have a different use case, or think the existing species groups should be
expanded, do not hesitate to drop us a line by creating an issue, in order to
propose changes to the dwd_species_group
table.
Attributions¶
Data copyright¶
All information on the web pages of the DWD is protected by copyright. As laid down in the Ordinance Setting the Terms of Use for the Provision of Federal Spatial Data (GeoNutzV), all spatial data and spatial data services available “for free” access may be used without any restrictions provided that the source is acknowledged. When speaking of spatial data, this also includes any location-related weather and climate information presented on the DWD open web pages.
Any other content presented on DWD web pages, in whole or extracts thereof, may be reproduced, altered, distributed, used or publicly presented only if expressly permitted by the DWD.
Acknowledgements¶
Thanks to the many observers of »Deutscher Wetterdienst« (DWD), the »Global Phenological Monitoring programme« (GPM), and all people working behind the scenes for their commitment on recording observations and making the excellent datasets available to the community. You know who you are.
Reproduce¶
This section explains how to export all available datasets into corresponding
SQLite database files, on your own machine, using the export-observations-all
subcommand.
The process will take about five to ten minutes, based on the capacity of your computing device. Processing the immediate/historical+recent and annual/recent data is pretty fast. The annual/historical data however, as the largest one with a size of ~1.7 GB, takes the major share of computing time on the export operation.
phenodata export-observations-all --source=dwd
The command will create four SQLite database files, they can be consumed using the sqlite3
command, or other tools.
Tip
If you want to create database files by selecting individual subsets of the data, please refer to the SQLite database export documentation.
Note
The cache directory, for example located at /Users/<username>/Library/Caches/phenodata
on macOS machines, will hold all the data downloaded from DWD servers. It is about
160 MB in size for both of the “recent” datasets, while immediate-historical
weighs in with about 500 MB, and annual-historical
with about another 3 GB.
Upload¶
rsync -azuv phenodata-dwd-*.sqlite root@elbanco.hiveeyes.org:/var/lib/phenodata/dwd
Backlog¶
Todo
[o] Publish using datasette
[o] Publish using Grafana SQLite Datasource
[o] Outline other end-user tools to consume the databases
[o] Implement
phenodata.open_database("dwd", "immediate", "recent")
to consume the databases
Enjoy your research.