home 🗺

How to get geo coordinates for POIs and show them on openstreetmap without need for any database backend, cgi, php or any other external database service - just by using symlinks and gatling http daemon - or alternative solution: cdb, ucspi-tcp and daemontols

by reinhard@finalmedia.de

2015/07/06, Update:2020/02/02

I found a hackish solution for this problem.

Why hackish?

Sure, I could also choose the boring way and use plain javascript, a json file for every country and loading POI data with ajax on demand. So you had to specify the target country first in searchform (or filter it), get your valid json or geojson file with ajax from server, parse for coordinates. you can handle the whole thing in javascript, e voila. done. but... this is boring and that's not, what I want.

I want a solution additionally satisfying the following needs:

So here is a solution.

See a live Demo in Action: http://cdn.osterbruecken.de/ostermap (german).

How it is done?

First I fetched a dump of the geonames database. For the this testing case I just needed data for region germany, so I fetched this file http://download.geonames.org/export/dump/DE.zip

I extracted the file DE.txt out of it, parsed the tab separated file (tsv) with tr and cut (or you can use awk if you like it) and used grep for getting all POIs, marked with ";P"

I reduced the charset and transformed them to lowercase, just allowing the following characters:

	a-zöäü ß .-

you can use the following chain to do that

	tr "\t" ";" < DE.txt | cut -d";" -f2,5,6,7 | grep ";P" |\ 
	tr -d "," | cut -d";" -f1,2,3 | tr ";" "," |\
	tr -dc "0-9a-zA-ZöäüÖÄÜß\n ,.-" | tr "A-ZÖÄÜ" "a-zöäü" > cities.txt

this will export all lines to a new file, called cities.txt based on the following format:

	city,lat,lon

UPDATE: The database of geonames.org was not very satisfying. So I used official openstreetmap database dumps from http://download.geofabrik.de, in this case germany-latest.osm.pbf (>2.4 GB) (uncompressed around 40GB) and extract all cities or streetnames out of it. Use osmconvert.c from the toolset of osmconvert for extracting data: (hint: build a 64bit executable and use a machine with a lot of RAM for this! processing the dataset germany-latest.osm will need about 14 GB of RAM on your machine and it will take some hours to finish)

	./osmconvert germany-latest.osm.pbf --max-objects=900000000 --all-to-nodes \
	--csv="name @lat @lon" --csv-separator="," | grep -v -E "^," > cities.txt

process your cities.txt and sort out all duplicate names (its a quick hack, perhaps i will rename those in an improved version later)

	sort -k1 -t, cities.txt | uniq > uniq_cities.txt
all cities are stored in file uniq_cities.txt now - line by line with its coordinates like this:

	zwötzen,50.84858,12.08635

then I wrote a small script, that reads those lines and makes lots of broken symlinks out of it, just putting them into a folder called "search".

	#!/bin/bash
	mkdir -p search
	cat uniq_cities.txt | while read line
	do
	url="http://osm.org/#map=/`echo $line| tr -dc "0-9.,-" | cut -d"," -f2,3| tr "," "/"`"
	symlink="search/`echo $line | cut -d"," -f1 | tr -dc "a-zA-ZöäüÖÄÜß. -" | tr "A-ZÖÄÜ" "a-zöäü"`"
	ln -s "$url" "$symlink"
	done

the name of the broken symlink is the name of the city and the symlink points to an URL like this

	http://osm.org/#map=/

with the given coordinates of the city.

sure, this is just an example. you can use your own tileserver and your own map, like I did in the demo "Ostermap" mentioned before.

In this way you'll get a lot of broken symlinks like these:

...
lrwxrwxrwx 1 user group  23 Jun 23 23:00 ührde -> http://osm.org/#map=/51.70547/10.20814
lrwxrwxrwx 1 user group  23 Jun 23 23:00 uhrendorf -> http://osm.org/#map=/53.86275/9.41756
lrwxrwxrwx 1 user group  23 Jun 23 23:00 uhrsleben -> http://osm.org/#map=/52.20087/11.26443
lrwxrwxrwx 1 user group  23 Jun 23 23:00 uhry -> http://osm.org/#map=52.29693/10.85758
lrwxrwxrwx 1 user group  23 Jun 23 23:00 uhsmannsdorf -> http://osm.org/#map=51.33048/14.90316
lrwxrwxrwx 1 user group  23 Jun 23 23:00 uhyst -> http://osm.org/#map=51.36469/14.506
lrwxrwxrwx 1 user group  23 Jun 23 23:00 uhyst am taucher -> http://osm.org/#map=51.19249/14.21843
lrwxrwxrwx 1 user group  23 Jun 23 23:00 uichteritz -> http://osm.org/#map=51.20652/11.92215
lrwxrwxrwx 1 user group  23 Jun 23 23:00 uiffingen -> http://osm.org/#map=49.5024/9.59269
lrwxrwxrwx 1 user group  23 Jun 23 23:00 uigenau -> http://osm.org/#map=49.31204/11.01731
lrwxrwxrwx 1 user group  23 Jun 23 23:00 uigendorf -> http://osm.org/#map=48.18048/9.57969
lrwxrwxrwx 1 user group  23 Jun 23 23:00 uissigheim -> http://osm.org/#map=49.67984/9.57134
lrwxrwxrwx 1 user group  23 Jun 23 23:00 ulbargen -> http://osm.org/#map=53.37535/7.58291
lrwxrwxrwx 1 user group  23 Jun 23 23:00 ulbering -> http://osm.org/#map=48.35362/13.01465
lrwxrwxrwx 1 user group  23 Jun 23 23:00 ulberndorf -> http://osm.org/#map=50.87472/13.67231
...

Why those broken symlinks?

Now, I can use those broken symlinks with gatling httpd, a tiny and really fast httpd server by Felix von Leitner.

I already used gatling for leaflet, my own maps and tiles I made with glosm.

You can find the project "Ostermap" right here. I just rendered the map for Saarland.

Gatling recognizes broken symlinks. If they contain "://" ( like in "http://" or "https://" it will make a valid http redirect out of it and redirect your browser to the given URL. This is a really nice feature. Thanks, fefe. In this way it will redirect any name of the given city to leaflet or openstreetmap with the given coordinates.

So I can start a locally listing gatling and enter the following url in a browser

	http://127.0.0.1/search/ulbargen

to get the geo-location of the city ulbargen and directly show it on the map.

furthermore, you can additionally supply a minimal vanilla javascript, that gets the value of an input-box, transforms given text to lowercase and calls the url by rewriting window.location as described above to subfolder "search/ulbargen". this tiny index.htm would do the magic:

	<html>
	<input title="please enter the name of the point of interest" id="name" value="ulbargen">
	<input type="button" value="suche" onclick="window.location+='search/'+document.getElementById('name').value.toLowerCase()">
	</html>

If any invalid POI name is entered, gatling just responses with 404 file not found. you can write additionally a ajax-script, catching 404 response and write something like "sorry, POI not found. please retry". Or specify your special 404 error page.

Ok. I got it. But why using broken symlinks and not just regular files and get them with javascript?

First: By storing the geo-information in an broken symlink i can implement a very compact storage of those coordinates without limitations of the underlaying filesystem or defined blocksize for a single regular file.

When you try to store the coordinates in regular files, also named by the name of the city - this is not very efficent: The whole "database" of this example would need over 240MB in total, since every file is about 4k on your storage. Even if it just contains those few bytes for the coordinates, every regular file would have a file size about 4096 bytes on your drive (because of block size) (see wikipedia if you want to know more about this). So lots of small regular files would waste a lot of storage capacity.

If you don't believe, just have a look at such files and compare the size with the the following commands

	echo hello > regular_file
	ls -slh1 regular_file
	stat regular_file
	du -hcs regular_file

	ln -s "hello again" symlink_file
	ls -slh1 symlink_file
	stat symlink_file
	du -hcs regular_file

Sure, you can change blocksize of your filesystem by reformating the blockdevice or possibly do some tweaks with tune2fs. But even then the minimal blocksize of ext3 would be around 512 bytes and these changes would be no out-of-the-box solution and could lead to disadvantages of other services on your system.

you will find some more information about this topic here, here and here.

When using symlinks, the whole "database" its just about 1.8 MB in total, since each symlink and inode just needs those 128 bytes in this case. It won't get "blown up" to 4k by the specified minium block size of the underlying filesystem.

now you can also make a tarball out of the folder for distributing the "database". the xz tarball is about 1.1 MB then.

you simply can add a new POI by doing this

	ln -s "http://osm.org/#map=/49.49361/7.26694" "osterbrücken"

and remove it, just by deleting the symlink

	rm osterbrücken

you also can specify zoom-level for individual POIs, if you want to. just use:

	ln -s "http://osm.org/#map=14/49.49361/7.26694" "osterbrücken"

Improvements

you could also distribute street-names in this way, for example by making cities as subfolders and put street POIs as symlinks. this would work with build in directory indexing of gatling.

since there are no unique city names, i should also consider to generate folders for duplicate names and then put each symlink in this folder.

Alternatives

Have a look at rfc5870 which describes the URL Scheme for Geo-Coordinates. Its a A Uniform Resource Identifier for Geographic Locations, in WGS-84 (World Geodetic System). But than you have to evaluate this URL at your client application. Also since gatling awaits "://" and the geo-url just is "geo:74.4294,19.0245", you wont get a successful redirect. you would have to change sourcecode of gatling in http.c for parsing this correctly.

Downloads

You can fetch my pois.de.txt.xz (51MB) with 4.919.091 entries in format "name,lat,lon". The dataset is based on extraction of openstreetmap database dump (20150701), so licenced under Open Data Commons Open Database Lizenz (ODbL), and so copyright by © OpenStreetMap contributors

UPDATE (20200206):

Here is a Version of current POI Names Dataset as cdb database:

You can use it with libcdb from djb and tcpserver from ucspi-tcp from djb for fast geoname decoding.

you need to apt-get install libcdb1 pv ucspi-tcp dos2unix and you need actual osm dumps in pbf file format, you can get them here and here. Then you can generate the cdb files with this script:

#!/bin/bash
# make geonames cdb file from all your current pbf osm export files in current
# working directory. need osmconvert, libcdb and pv
find . -name "*.pbf" -exec sh -c 'osmconvert "{}" --max-objects=90000000000 \
--all-to-nodes --csv="name @lat @lon" --csv-separator="," | grep -v -E "^," | \
tr -d "\\t:> " | tr "," " " | pv -N "{}" -l | cdb -m -c "{}.cdb"' "{}" \;

As you can see, I choose to delete spaces. So you have to search for "NeuerWeg" instead of "Neuer Weg".

The following scripts are your webserver. so you won't need gatling, just tcpserver from ucspi-tcp and cdb utils.

stdin input is restricted to a-z A-Z 0-9 - so, sorry no Umlaute, yet. todo: add urldecode, and öäüÖÄÜß in stdin truncate delete complement.

Whole Germany

Use with dash not with bash.

#!/bin/dash
# germany_serv.sh
# name to coord list
echo "HTTP/1.4 200 OK"
echo "Content-Type: text/plain"
echo
cdb -m -q germany.20200209.cdb \
"$(timeout 2 head -n1|head -c 128|cut -d " " -f 2|tr -dc "0-9a-zA-Z-")" || echo "no result"
exit 0

Just Saarland (with redirect header to online map tiles)

Use with dash not with bash.

#!/bin/dash
# saar_serv.sh redirect
echo "HTTP/1.1 302 Moved Temporarily"
echo "Content-Length: 0"
echo -n "Location: https://osterbruecken.de/ostermap/?pos="
cdb -m -q saarland.20191123.cdb \
"$(timeout 2 head -n1|head -c 64|cut -d " " -f 2|tr -dc "0-9a-zA-Z-")" | \
head -n1 | tr -dc "0-9. " | tr " " ","
echo
exit 0

And the wrapper script you can use with daemontools.

#!/bin/sh
# start_saarserv.sh (tcpwrapper)
# run as restricted user!!! (setuidgid)
ulimit 12000
exec setuidgid nobody tcpserver -R -H -D -c 40000 127.0.0.1 8000 recordio ./saar_serv.sh

Connect Benchmark TestScript (uses http@ client from ucspi-tcp)

ulimit 64000; yes | xargs -P 0 sh -c "http@ 127.0.0.1 NeuerWeg 8000"

Hint: To safe space in database, all spaces are truncated. So search for "NeuerWeg" instead of "Neuer Weg"

Your Browser-Request would be http://127.0.0.1:8000/NeuerWeg

Update: Do 17. Sep 23:06:15 CEST 2020

german postcodes (plz) to geo coordinates: plz2geo.txt

RFC 1876 (January 1996) mentions DNS LOC Records. so you can check dig dkdhr.com LOX +short -> 42 21 43.528 N 71 5 6.284 W -25.00m 1m 3000m 10m.

its easy to set the plz2geo them in nameserver: example: 66606.plz.cafeface.de to get -> 7.18821 N 49.46627 E. It has to get encoded to LOC Format:


   This RFC defines the format of a new Resource Record (RR) for the
   Domain Name System (DNS), and reserves a corresponding DNS type
   mnemonic (LOC) and numerical code (29).

2. RDATA Format

       MSB                                           LSB
       +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
      0|        VERSION        |         SIZE          |
       +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
      2|       HORIZ PRE       |       VERT PRE        |
       +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
      4|                   LATITUDE                    |
       +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
      6|                   LATITUDE                    |
       +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
      8|                   LONGITUDE                   |
       +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
     10|                   LONGITUDE                   |
       +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
     12|                   ALTITUDE                    |
       +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
     14|                   ALTITUDE                    |
       +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
   (octet)

where:

VERSION      Version number of the representation.  This must be zero.
             Implementations are required to check this field and make
             no assumptions about the format of unrecognized versions.

SIZE         The diameter of a sphere enclosing the described entity, in
             centimeters, expressed as a pair of four-bit unsigned
             integers, each ranging from zero to nine, with the most
             significant four bits representing the base and the second
             number representing the power of ten by which to multiply
             the base.  This allows sizes from 0e0 (<1cm) to 9e9
             (90,000km) to be expressed.  This representation was chosen
             such that the hexadecimal representation can be read by
             eye; 0x15 = 1e5.  Four-bit values greater than 9 are
             undefined, as are values with a base of zero and a non-zero
             exponent.

             Since 20000000m (represented by the value 0x29) is greater
             than the equatorial diameter of the WGS 84 ellipsoid
             (12756274m), it is therefore suitable for use as a
             "worldwide" size.

HORIZ PRE    The horizontal precision of the data, in centimeters,
             expressed using the same representation as SIZE.  This is
             the diameter of the horizontal "circle of error", rather


VERT PRE     The vertical precision of the data, in centimeters,
             expressed using the sane representation as for SIZE.  This
             is the total potential vertical error, rather than a "plus
             or minus" value.  (This was chosen to match the
             interpretation of SIZE; to get a "plus or minus" value,
             divide by 2.)  Note that if altitude above or below sea
             level is used as an approximation for altitude relative to
             the [WGS 84] ellipsoid, the precision value should be
             adjusted.

LATITUDE     The latitude of the center of the sphere described by the
             SIZE field, expressed as a 32-bit integer, most significant
             octet first (network standard byte order), in thousandths
             of a second of arc.  2^31 represents the equator; numbers
             above that are north latitude.

LONGITUDE    The longitude of the center of the sphere described by the
             SIZE field, expressed as a 32-bit integer, most significant
             octet first (network standard byte order), in thousandths
             of a second of arc, rounded away from the prime meridian.
             2^31 represents the prime meridian; numbers above that are
             east longitude.

ALTITUDE     The altitude of the center of the sphere described by the
             SIZE field, expressed as a 32-bit integer, most significant
             octet first (network standard byte order), in centimeters,
             from a base of 100,000m below the [WGS 84] reference
             spheroid used by GPS (semimajor axis a=6378137.0,
             reciprocal flattening rf=298.257223563).  Altitude above
             (or below) sea level may be used as an approximation of
             altitude relative to the the [WGS 84] spheroid, though due
             to the Earth's surface not being a perfect spheroid, there
             will be differences.  (For example, the geoid (which sea
             level approximates) for the continental US ranges from 10
             meters to 50 meters below the [WGS 84] spheroid.
             Adjustments to ALTITUDE and/or VERT PRE will be necessary
             in most cases.  The Defense Mapping Agency publishes geoid
             height values relative to the [WGS 84] ellipsoid.


      LOC ( d1 [m1 [s1]] {"N"|"S"} d2 [m2 [s2]]
                               {"E"|"W"} alt["m"] [siz["m"] [hp["m"]
                               [vp["m"]]]] )

   (The parentheses are used for multi-line data as specified in [RFC
   1035] section 5.1.)

   where:

       d1:     [0 .. 90]            (degrees latitude)
       d2:     [0 .. 180]           (degrees longitude)
       m1, m2: [0 .. 59]            (minutes latitude/longitude)
       s1, s2: [0 .. 59.999]        (seconds latitude/longitude)
       alt:    [-100000.00 .. 42849672.95] BY .01 (altitude in meters)
       siz, hp, vp: [0 .. 90000000.00] (size/precision in meters)