patternsqlMinor
Best way to model a database structure to store multidimensional variables?
Viewed 0 times
multidimensionalwaystoredatabasestructurevariablesmodelbest
Problem
I am working with some NetCDF data which has multidimensional variables.
For example, a
I have about 14 variables that I need to store.
Data retrieval tendencies
Now, I'm going to need to query this data and ask questions like:
Current considerations
I could let each row in my DB be an
Or, perhaps create a table for each kind of
I'm not sure which will be more appropriate, or whether there is a better option, so I'd appreciate some advice.
- As far as I'm aware, the fields are not mandatory.
- Each variable will have the same fields/attributes.
- Each variable represents a measurement of the atmosphere, in time & space.
Levelrepresents a pressure level (like a layer of an onion -- a thick layer).
For example, a
variable called SO2 (Sulphur Dioxide) may be 4-D: Time, Level (pressure), latitude, longitude.I have about 14 variables that I need to store.
Data retrieval tendencies
Now, I'm going to need to query this data and ask questions like:
- Give me all the rows for
variableSO2 that fall betweenlatitudeX andlongitudeY attimeT andlevelL.
- What is the
average temperaturefortimeT atlevelL?
- etc...
Current considerations
I could let each row in my DB be an
observation and have the columns: type, lat, long, time, level, value.Or, perhaps create a table for each kind of
variable and have the columns: lat, long, time, level, value.I'm not sure which will be more appropriate, or whether there is a better option, so I'd appreciate some advice.
Solution
So you have some basic data.
What you want to do is create a point with lat, and long...
Now, we just store that into a table..
Give me all the rows for variable SO2 that fall between latitude X and longitude Y at time T and level L.
This is not how GIS works, because between changes on a sphere. So instead, just use distance. So instead what we do
What is the average temperature for time T at level L?
For this we would have had to store the temperature, assuming we did -- which we didn't. It'd look like this.
As a side note ST_DWithin will use an index.. (as will level and timestamp if you create the indexes). However, if you're new to GIS you create GIS indexes with GIST and not btree.
For a similar question see this answer.
UPDATE
This file is a complex proprietary raster,
You can import it by generating PAM files.
Them remove the
This gets some kind of raster into the database. Now you have to start learning the PostGIS GDAL raster functions to see what and how you can work with it.. Using ST_MetaData we can see you have a 12 rasters with 90 bands (with
```
SELECT rid, (metadata).* FROM foo.foo1 CROSS JOIN LATERAL st_metadata(rast) AS metadata;
rid | upperleftx | upperlefty | width | height | scalex | scaley | skewx | skewy | srid | numbands
-----+--------------------+------------+-------+--------+-------------------+--------+-------+-------+------+----------
1 | -0.200000003394614 | 90.2 | 900 | 451 | 0.400000006789228 | -0.4 | 0 | 0 | 0 | 90
2 | -0.200000003394614 | 90.2 | 900 | 451 | 0.400000006789228 | -0.4 | 0 | 0 | 0 | 90
3 | -0.200000003394614 | 90.2 | 900 |
SELECT timestamp::timestamp, level, lat, long
FROM ( VALUES
(now(), 7, 1, 9)
)
AS t(timestamp, level, lat, long);
timestamp | level | lat | long
----------------------------+-------+-----+------
2017-02-08 15:38:54.903155 | 7 | 1 | 9What you want to do is create a point with lat, and long...
SELECT timestamp::timestamp, level, ST_AsText(point::geography)
FROM ( VALUES
(now(), 7, 1, 9)
)
AS t(timestamp, level, lat, long)
CROSS JOIN LATERAL ST_SetSRID( ST_MakePoint(long,lat), 4326 )
AS point;
timestamp | level | st_astext
----------------------------+-------+------------
2017-02-08 15:38:00.892956 | 7 | POINT(9 1)
(1 row)Now, we just store that into a table..
CREATE TEMP TABLE foo
AS
SELECT timestamp::timestamp, level, point::geography
FROM ( VALUES
(now(), 7, 1, 9)
)
AS t(timestamp, level, lat, long)
CROSS JOIN LATERAL ST_SetSRID( ST_MakePoint(long,lat), 4326 )
AS point;Give me all the rows for variable SO2 that fall between latitude X and longitude Y at time T and level L.
This is not how GIS works, because between changes on a sphere. So instead, just use distance. So instead what we do
SELECT *
FROM foo
WHERE ST_DWithin(
(ST_MakePoint(long,lat),4326)::geography,
point,
1000 -- distance in meters
)
FROM foo;What is the average temperature for time T at level L?
For this we would have had to store the temperature, assuming we did -- which we didn't. It'd look like this.
SELECT avg(temp)
FROM foo
WHERE level = l
AND time BETWEEN start AND finish;As a side note ST_DWithin will use an index.. (as will level and timestamp if you create the indexes). However, if you're new to GIS you create GIS indexes with GIST and not btree.
CREATE INDEX ON foo USING gist ( point );For a similar question see this answer.
UPDATE
This file is a complex proprietary raster,
gdalinfo file.nc
Driver: netCDF/Network Common Data Format
Files: file.nc
Size is 512, 512
Coordinate System is `'
Metadata:
NC_GLOBAL#Conventions=CF-1.6
NC_GLOBAL#history=2017-02-08 12:30:09 GMT by grib_to_netcdf-2.0.2: grib_to_netcdf /data/data01/scratch/_mars-atls17-95e2cf679cd58ee9b4db4dd119a05a8d-mEMhIr.grib -o /data/data01/scratch/_grib2netcdf-atls00-95e2cf679cd58ee9b4db4dd119a05a8d-lrYLIt.nc -utime
Subdatasets:
SUBDATASET_1_NAME=NETCDF:"file.nc":aermr01
SUBDATASET_1_DESC=[45x2x451x900] aermr01 (16-bit integer)
SUBDATASET_2_NAME=NETCDF:"file.nc":aermr02
SUBDATASET_2_DESC=[45x2x451x900] aermr02 (16-bit integer)
SUBDATASET_3_NAME=NETCDF:"file.nc":aermr03
SUBDATASET_3_DESC=[45x2x451x900] aermr03 (16-bit integer)
SUBDATASET_4_NAME=NETCDF:"file.nc":aermr04
SUBDATASET_4_DESC=[45x2x451x900] aermr04 (16-bit integer)
SUBDATASET_5_NAME=NETCDF:"file.nc":aermr05
SUBDATASET_5_DESC=[45x2x451x900] aermr05 (16-bit integer)
SUBDATASET_6_NAME=NETCDF:"file.nc":aermr06
SUBDATASET_6_DESC=[45x2x451x900] aermr06 (16-bit integer)
SUBDATASET_7_NAME=NETCDF:"file.nc":aermr11
SUBDATASET_7_DESC=[45x2x451x900] aermr11 (16-bit integer)
SUBDATASET_8_NAME=NETCDF:"file.nc":so2
SUBDATASET_8_DESC=[45x2x451x900] so2 (16-bit integer)
SUBDATASET_9_NAME=NETCDF:"file.nc":z
SUBDATASET_9_DESC=[45x2x451x900] geopotential (16-bit integer)
SUBDATASET_10_NAME=NETCDF:"file.nc":t
SUBDATASET_10_DESC=[45x2x451x900] air_temperature (16-bit integer)
SUBDATASET_11_NAME=NETCDF:"file.nc":q
SUBDATASET_11_DESC=[45x2x451x900] specific_humidity (16-bit integer)
SUBDATASET_12_NAME=NETCDF:"file.nc":r
SUBDATASET_12_DESC=[45x2x451x900] relative_humidity (16-bit integer)
Corner Coordinates:
Upper Left ( 0.0, 0.0)
Lower Left ( 0.0, 512.0)
Upper Right ( 512.0, 0.0)
Lower Right ( 512.0, 512.0)
Center ( 256.0, 256.0)You can import it by generating PAM files.
gdal_translate -sds file.nc fileThem remove the
.aux.xml files with rm *.aux.xml; And, run this to import them all into the databaseraster2pgsql file_* mySchema.mytable | psql -d databaseThis gets some kind of raster into the database. Now you have to start learning the PostGIS GDAL raster functions to see what and how you can work with it.. Using ST_MetaData we can see you have a 12 rasters with 90 bands (with
ST_BandMetaData we can see they're 16bit unsigned).```
SELECT rid, (metadata).* FROM foo.foo1 CROSS JOIN LATERAL st_metadata(rast) AS metadata;
rid | upperleftx | upperlefty | width | height | scalex | scaley | skewx | skewy | srid | numbands
-----+--------------------+------------+-------+--------+-------------------+--------+-------+-------+------+----------
1 | -0.200000003394614 | 90.2 | 900 | 451 | 0.400000006789228 | -0.4 | 0 | 0 | 0 | 90
2 | -0.200000003394614 | 90.2 | 900 | 451 | 0.400000006789228 | -0.4 | 0 | 0 | 0 | 90
3 | -0.200000003394614 | 90.2 | 900 |
Code Snippets
SELECT timestamp::timestamp, level, lat, long
FROM ( VALUES
(now(), 7, 1, 9)
)
AS t(timestamp, level, lat, long);
timestamp | level | lat | long
----------------------------+-------+-----+------
2017-02-08 15:38:54.903155 | 7 | 1 | 9SELECT timestamp::timestamp, level, ST_AsText(point::geography)
FROM ( VALUES
(now(), 7, 1, 9)
)
AS t(timestamp, level, lat, long)
CROSS JOIN LATERAL ST_SetSRID( ST_MakePoint(long,lat), 4326 )
AS point;
timestamp | level | st_astext
----------------------------+-------+------------
2017-02-08 15:38:00.892956 | 7 | POINT(9 1)
(1 row)CREATE TEMP TABLE foo
AS
SELECT timestamp::timestamp, level, point::geography
FROM ( VALUES
(now(), 7, 1, 9)
)
AS t(timestamp, level, lat, long)
CROSS JOIN LATERAL ST_SetSRID( ST_MakePoint(long,lat), 4326 )
AS point;SELECT *
FROM foo
WHERE ST_DWithin(
(ST_MakePoint(long,lat),4326)::geography,
point,
1000 -- distance in meters
)
FROM foo;SELECT avg(temp)
FROM foo
WHERE level = l
AND time BETWEEN start AND finish;Context
StackExchange Database Administrators Q#163637, answer score: 3
Revisions (0)
No revisions yet.