Instant Heatmap with pg_featureserv
The pg_featureserv micro-service is a thin middleware that binds tables and functions in a PostgreSQL database to a JSON collections API, accessible over HTTP. Using the Crunchy Bridge container apps, I'm going to give a quick overview of how to set up a web based spatial heatmap from Postgres.
Application
The application uses PostgreSQL to store and search 2.2M geographic names in the USA. Type in the search box and the auto-fill form will find candidate words. Select a word, and the database will perform a full-text search for all the names that match your word, and return the result to the map. The map displays the result using a heat map.
More names closer together will get more vibrant colors, so you can see name density. Try some regional names: bayou, swamp, cherokee, baptist, cougar. Try it for yourself.
Architecture
Using pg_featureserv to build a demonstration application is pretty easy all you need is:
- PostgreSQL running somewhere
pg_featureserv
running somewhere- A web page hosted somewhere.
Wait, "running somewhere" is doing a lot of work here! Is there any way to do this without become a specialist in managing database and container ops?
Yes, we can do all the heavy lifting with Crunchy Bridge!
- Crunchy Bridge hosts the database.
- Crunchy Bridge container apps
host the
pg_featureserv
microservice and avarnish
web cache. - The static web page goes anywhere that can hold a static web page (S3 in this case).
Prepare the Data
We will build a small web application that can visualize the location of named places in the United States. This is far more interesting than it sounds, because named places carry a lot of local history and context with them, and that context shows up in maps!
Start by downloading the US named places.
wget https://download.geonames.org/export/dump/US.zip
unzip US.zip
For added security, we will connect our pg_featureserv
microservice using an
application account, and make a separate database for the application.
-- as postgres
CREATE DATABASE geonames WITH OWNER = application;
Then create a table to receive the data and load it up.
Create Table and Copy
CREATE TABLE geonames (
geonameid integer primary key,
name text,
asciiname text,
alternatenames text,
latitude float8,
longitude float8,
featureclass text,
featurecode text,
countrycode text,
countrycode2 text,
admin1 text,
admin2 text,
admin3 text,
admin4 text,
population bigint,
elevation integer,
dem integer,
timezone text,
modified date
);
\copy geonames FROM 'US.txt' WITH (
FORMAT csv,
DELIMITER E'\t',
HEADER false,
ENCODING utf8)
Create Indexes
There are 2.2M named places in the database, and we want to very quickly find names that match our query word. Often the names are multi-word ("Gold River", "West Gold Hills") and we will be searching with just one word. This is where full-text search indexing comes in handy.
Full-text indexes can only be built on a tsvector
type. We can either add a
new tsvector
column to our table, and then populate it, or save a little space
and just build a
functional index
on the tsvector
construction function.
CREATE INDEX geonames_name_x
ON geonames
USING GIN (to_tsvector('english', name))
The drop-down form fill needs a list of unique name components, preferably in order of frequency, so "interesting" ones appear at the top of the list. We build that by stripping down and aggregating all the names in the table.
CREATE TABLE geonames_stats AS
SELECT
count(*) AS ndoc,
unnest(regexp_split_to_array(lower(trim(name)), E'[^a-zA-Z]')) AS word
FROM geonames GROUP BY 2;
CREATE INDEX geonames_stats_word_x ON geonames_stats (word text_pattern_ops);
Create Functions to Publish
The web map application needs an HTTP API to connect to, this is where
pg_featureserv
comes in. We will create two functions:
- One function to drive the dropdown form field, which takes the characters currently typed and finds all matching words.
- One function to drive the map, which takes in the query word and returns all matching places.
The magic power of pg_featureserv
is in the ability to publish
user defined functions.
Any function defined in the postgisftw
schema will be published as a web end
point.
Define a function to populate the dropdown form field.
CREATE SCHEMA postgisftw;
DROP FUNCTION IF EXISTS postgisftw.geonames_stats_query;
CREATE FUNCTION postgisftw.geonames_stats_query(
q text DEFAULT 'bea')
RETURNS TABLE(value text, ndoc bigint)
AS $$
BEGIN
RETURN QUERY
SELECT g.word as value, g.ndoc
FROM geonames_stats g
WHERE g.word LIKE q || '%'
ORDER BY g.ndoc DESC
LIMIT 15;
END;
$$
LANGUAGE 'plpgsql'
PARALLEL SAFE
STABLE
STRICT;
Define a function to query for place names.
DROP FUNCTION IF EXISTS postgisftw.geonames_query;
CREATE FUNCTION postgisftw.geonames_query(q text DEFAULT 'beach')
RETURNS TABLE(name text, featureclass text, longitude float8, latitude float8)
AS $$
BEGIN
RETURN QUERY
SELECT
g.name,
g.featureclass,
g.longitude,
g.latitude
FROM geonames g
WHERE to_tsvector('english', g.name) @@ plainto_tsquery('english', q)
ORDER BY md5(g.name)
LIMIT 10000;
END;
$$
LANGUAGE 'plpgsql'
PARALLEL SAFE
STABLE
STRICT;
Run the Microservices
Our architecture uses two microservices: pg_featureserv
to publish the
database functions as web services; and varnish
to protect the feature service
from excessive load if the application gets a lot of traffic.
These services are run as
Crunchy Bridge Container Apps
using an extension called pgpodman
.
CREATE EXTENSION pgpodman;
Once the extension is enabled, the containers can be turned on. This involves some magic strings, primarily the DNS name of the database host, which is available in the connection strings, and also in the container apps user interface.
Run the pg_featureserv
container.
SELECT run_container('
-dt -p 5442:9000/tcp
--log-driver k8s-file
--log-opt max-size=1mb
-e DATABASE_URL="postgres://application:password@p.xxxxxxxxxxxx.db.postgresbridge.com:5432/geonames"
-e PGFS_SERVER_HTTPPORT=9000
-e PGFS_PAGING_LIMITDEFAULT=10000
-e PGFS_PAGING_LIMITMAX=10000
docker.io/pramsey/pg_featureserv:latest');
(Newlines are inserted into this example so you can see how the parameters are
passed to the container. Environment variables to configure the service are
passed in with -e
.)
Run the varnish
container.
SELECT run_container('
-dt -p 5435:6081/tcp
--log-driver k8s-file
--log-opt max-size=1mb
-e BACKENDS=p.fhlzf432a5gmvaj456jql4a4di.db.postgresbridge.com:5442
-e DNS_ENABLED=false
-e COOKIES=true
-e PARAM_VALUE="-p default_ttl=3600"
docker.io/pramsey/varnish:latest');
The varnish
container is the "outward facing" service, so web requests should
come into port 5435, where they will be passed to the varnish process inside the
container on port 6081. Then varnish
will call out to its BACKEND
host
(pg_featureserv
) this time on port 5442, which in turn is proxied into port
9000 inside the container.
Try it Manually
Once the services are running, you can hit them manually from the outside.
http://p.fhlzf432a5gmvaj456jql4a4di.db.postgresbridge.com:5435/functions/geonames_query/items.json?q=cougar
http://p.fhlzf432a5gmvaj456jql4a4di.db.postgresbridge.com:5435/functions/geonames_stats_query/items.json?q=cougar
Of course, it is much more fun to use the functions with the web map!
Conclusions
- Publishing web services using the Crunchy Bridge Container Apps is a neat way to quickly stand up web services.
- PostgreSQL full-text indexes can very quickly search very large corpuses of text.
- Seeing data visually in a map is a great way to explore it!