Required Hydrological Datasets
- Hindcast/Retrospective discharge for every stream segment (reporting point) in the model. This is a time series of
discharge, e.g. hydrograph, for each stream segment. The data should be saved in parquet format and named
hindcast_series_table.parquet
. The DataFrame should have:- An index named
datetime
of typedatetime
. Contains the datetime stamp for the simulated values (rows) - 1 column per stream, column name is the stream's model ID and is type string, containing the discharge for each time step.
- An index named
- Observed discharge data for each gauge. 1 file per gauge named
{gauge_id}.csv
. The DataFrame should have:datetime
: The datetime stamp for the measurements- A column whose name is the unique
gauge_id
containing the discharge for each time step.
The hindcast_series_table.parquet
should look like this:
datetime | model_id_1 | model_id_2 | model_id_3 | ... |
---|---|---|---|---|
1985-01-01 | 50 | 50 | 50 | ... |
1985-01-02 | 60 | 60 | 60 | ... |
1985-01-03 | 70 | 70 | 70 | ... |
... | ... | ... | ... | ... |
Each gauge's csv file should look like this:
datetime | discharge |
---|---|
1985-01-01 | 50 |
1985-01-02 | 60 |
1985-01-03 | 70 |
... | ... |
Things to check
Be sure that both datasets:
- Are in the same units (e.g. m3/s)
- Are in the same time zone (e.g. UTC)
- Are in the same time step (e.g. daily average)
- Do not contain any non-numeric values (e.g. ICE, none, etc.)
- Do not contain rows with missing values (e.g. NaN or blank cells)
- Have been cleaned of any incorrect values (e.g. no negative values)
- Do not contain any duplicate rows