Vector Data Management & Optimization

Author

Jeff McKenna

Contact

jmckenna at gatewaygeomatics.com

Original Author

HostGIS

Last Updated

2022-04-02

Choose the right vector format for your needs

  • The best/optimal vector data source for MapServer in terms of speed-of-display is .shp.

  • For databases, MapServer displays PostGIS layers very fast, and some custom tricks are included in the MapServer source specifically for PostGIS + MapServer draw speed. PostGIS is therefore recommended.

  • SpatiaLite is recommended for those who require a portable format, and works very well with MapServer.

Spend time to review GDAL’s associated driver page for your chosen format

This is a critical step, as MapServer relies on GDAL (actually the OGR part of the project, which handles vectors) for vector data access. Each driver (OGR format) has its own set of abilities and switches. Find your vector format and review its options here.

Connect to your data through OGR/GDAL

For data management in MapServer, this should always be one of your first steps. Sometimes desktop GIS programs will display a format or its attributes in a certain way (such as in uppercase) but your data might not display in MapServer; checking how OGR/GDAL reads your data file or database table, will help you manage the data. The OGR Vector Layers Through MapServer document has excellent examples (for ogrinfo and other commands) to connect to your data. Here is an example connecting to a PostGIS database, and returning a list of spatial tables:

ogrinfo -ro PG:"host=127.0.0.1 user=postgres password=postgres port=5432 dbname=mydb"

      INFO: Open of `PG:host=127.0.0.1 user=postgres password=postgres port=5432 dbname=mydb'
            using driver `PostgreSQL' successful.
      1: popplace (Point)
      2: province (Multi Polygon)

Then get a summary of the ‘popplace’ table through ogrinfo:

ogrinfo -ro PG:"host=127.0.0.1 user=postgres password=postgres port=5433 dbname=mydb" popplace -summary


    INFO: Open of `PG:host=127.0.0.1 user=postgres password=postgres port=5433 dbname=mydb'
          using driver `PostgreSQL' successful.

        Layer name: popplace
        Geometry: Point
        Feature Count: 497
        Extent: (-2303861.750000, -681502.875000) - (2961766.250000, 3798856.750000)
        Layer SRS WKT:
        PROJCS["NAD83 / Canada Atlas Lambert",
            GEOGCS["NAD83",
                DATUM["North_American_Datum_1983",
                    SPHEROID["GRS 1980",6378137,298.257222101,
                        AUTHORITY["EPSG","7019"]],
                    TOWGS84[0,0,0,0,0,0,0],
                    AUTHORITY["EPSG","6269"]],
                PRIMEM["Greenwich",0,
                    AUTHORITY["EPSG","8901"]],
                UNIT["degree",0.0174532925199433,
                    AUTHORITY["EPSG","9122"]],
                AUTHORITY["EPSG","4269"]],
            PROJECTION["Lambert_Conformal_Conic_2SP"],
            PARAMETER["standard_parallel_1",49],
            PARAMETER["standard_parallel_2",77],
            PARAMETER["latitude_of_origin",49],
            PARAMETER["central_meridian",-95],
            PARAMETER["false_easting",0],
            PARAMETER["false_northing",0],
            UNIT["metre",1,
                AUTHORITY["EPSG","9001"]],
            AXIS["Easting",EAST],
            AXIS["Northing",NORTH],
            AUTHORITY["EPSG","3978"]]
        FID Column = gid
        Geometry Column = geom
        area: Real (0.0)
        perimeter: Real (0.0)
        popplace_: Real (0.0)
        popplace_i: Real (0.0)
        unique_key: String (5.0)
        name: String (25.0)
        name_e: String (20.0)
        name_f: String (20.0)
        reg_code: Real (0.0)
        nts50: String (7.0)
        lat: String (7.0)
        long: String (7.0)
        sgc_code: Real (0.0)
        capital: Real (0.0)
        pop_range: Real (0.0)

Note

You can use the extent values returned from ogrinfo, to paste into your mapfile’s EXTENT parameter. You can also notice in that summary the PROJCS/AUTHORITY line, which states that this data is currently in the EPSG:3978 projection.

Learn & Review the various OGR utilities to manage your vectors

OGR commandline utilities are very useful, making it easy to manipulate your source vector file or database, before display in MapServer. For example, you might want to import a spatial file into your existing PostGIS database, which you can do easily through ogr2ogr as the following (that takes a shapefile, imports it into an existing PostGIS database, and renames the new table to ‘roadways’) :

ogr2ogr PG:"host=127.0.0.1 user=postgres password=postgres port=5432 dbname=mydb" road.shp -nln roadways

Note

where ogr2ogr syntax is actually: ogr2ogr destination source

You might also import a spatial file, and reproject it to another EPSG projection, as the following (that takes a shapefile, reprojects it into the web mercator EPSG:3857, imports it into an existing PostGIS database, and renames the new table to ‘roadways3857’) :

ogr2ogr -t_srs EPSG:3857 -s_srs EPSG:3978 PG:"host=127.0.0.1 user=postgres password=postgres port=5432 dbname=mydb" road.shp -nln roadways3857

Note

where ogr2ogr syntax is actually: ogr2ogr -t_srs (ouput projection) -s_srs (source projection) destination source

Review all of the available OGR vector utilities here.

Index your data

This of course is an important step for any vector layer, for fast display in MapServer. There are several types of indexes that you can enable to speed up the display of your vector data with MapServer:

Tileindexing

See Tile Indexes for more detailed information about on-the-fly mosaicing with MapServer.

Add a Spatial Index to your data

Make sure that your geometry has a spatial index.

  • Shapefile: see shptree

  • See below for databases (and GPKG)

Add an Attribute Index to your data

If you will be querying or filtering by a specific column in your data, you should setup an attribute index on your vector data.

Shapefiles:

ogrinfo -sql  "CREATE INDEX ON province USING NAME_E" province.shp

2 files should be created with extensions: .ind & .idm

GeoPackage:

ogrinfo -sql "CREATE INDEX IDXnewindexname ON yourtable (yourcolumn)" file.gpkg

Splitting your data

If you find yourself making several layers, all of them using the same dataset but filtering to only use some of the records, you could probably do it better. If the criteria are static, one approach is to pre-split the data.

The ogr2ogr utility can select on certain features from a datasource, and save them to a new data source. Thus, you can split your dataset into several smaller ones that are already effectively filtered, and remove the FILTER statement.

If you are using shapefiles, the shp2tile utility is a wonderful commandline tool for this.

Note

For Windows users, MS4W includes the shp2tile utility, and all utilities mentioned here.

Handling your vector LAYERS in the mapfile

Please review the notes in the document Mapfile Tuning & Management. You should also check for any specific MapServer notes for your format in the Vector Data document.

Shapefile Notes

Use shptree to generate a spatial index on your shapefile. This is quick and easy (“shptree foo.shp”) and generates a .qix file. MapServer will automagically detect an index and use it.

Note

Tileindexes can also be indexed with shptree.

MapServer also comes with the sortshp utility. This reorganizes a shapefile, sorting it according to the values in one of its columns. If you’re commonly filtering by criteria and it’s almost always by a specific column, this can make the process slightly more efficient.

Although shapefiles are a very fast data format, PostGIS is pretty speedy as well, especially if you use indexes well and have memory to throw at caching.

PostGIS Notes

Indexing with PostGIS

The single biggest boost to performance is indexing. Make sure that there’s a GIST index on the geometry column, and each record should also have an indexed primary key. If you used shp2pgsql, then these statements should create the necessary indexes:

ALTER TABLE table ADD PRIMARY KEY (gid);
CREATE INDEX table_the_geom ON table (the_geom) USING GIST;

PostgreSQL also supports reorganizing the data in a table, such that it’s physically sorted by the index. This allows PostgreSQL to be much more efficient in reading the indexed data. Use the CLUSTER command, e.g.

CLUSTER the_geom ON table;

Then there are numerous optimizations one can perform on the database server itself, aside from the geospatial component. The easiest is to increase shared_buffers in the postgresql.conf file, which allows PostgreSQL to use more memory for caching. It is worth the time to investigate the various options in the Resource Consumption section of the PostgreSQL documentation.

Specifying a Unique ID column

To handle queries, MapServer requires a unique ID column, as part of your PostgreSQL table. MapServer will try to guess the unique ID column, but that is costly as several more database queries have to be made; instead, you should always specify a unique ID column in your DATA statement of the layer, with the using unique syntax, such as:

DATA "geom FROM mydata USING UNIQUE myid USING SRID=3857"

If your table does not have a unique ID column, you can add one such as:

ALTER TABLE mytable ADD COLUMN unique_id SERIAL PRIMARY KEY;

Warning

In older PostgreSQL versions, a (dirty) trick was to use the existing OID as the unique ID, however OIDs were removed from PostgreSQL as of the PostgreSQL 12.0 release. So always specify an actual unique ID column instead, in your layer’s DATA statement, of your mapfile.

Debugging speed issues with PostGIS

You may face a situation where your PostGIS table is drawing slowly in MapServer. The following steps will help you examine the issue (using a WFS case) :

  • Always start by getting the draw time for your layer, with a shp2img command:

    shp2img -m postgis-wfs.map -o ttt.png -map_debug 3
    
      msDrawMap(): Layer 0 (provinces), 1.587s
      msDrawMap(): Drawing Label Cache, 0.000s
      msDrawMap() total time: 1.589s
      msSaveImage(ttt.png) total time: 0.005s
      freeLayer(): freeing layer at 010E0EC0.
      msPostGISLayerIsOpen called.
      msConnPoolClose(host=127.0.0.1 user=postgres password=postgres port=5432 dbname=gmap,01197840)
    
  • To diagnose if the problem is with your PostGIS table configuration, or with MapServer, execute the exact request sent by MapServer, at the psql.exe commandline, by doing the following steps:

    • add into your MAP-level of your mapfile:

      CONFIG "CPL_DEBUG" "ON"
      CONFIG "MS_ERRORFILE" "/ms4w/tmp/ms_error.txt"
      DEBUG 5
      
    • add into your PostGIS LAYER of your mapfile:

      DEBUG 5
      
    • now use a WFS client such as QGIS and add your WFS PostGIS layer

    • open “/ms4w/tmp/ms_error.txt” in Notepad++

    • search for “msPostGISLayerWhichShapes query:”

    • that line should list the long exact query sent from MapServer to the PostreSQL instance, it may look like:

      [Mon Apr 12 11:27:34 2021].207000 msPostGISLayerWhichShapes query:
       SELECT "gid"::text,"area"::text,"perimeter"::text,"province_"::text,"province_i"::text,"status"::text,"name"::text,"name_e"::text,"name_f"::text,"reg_code"::text,"poly_featu"::text,"island"::text,"island_e"::text,"island_f"::text,ST_AsBinary(("geom"),'NDR')
       as geom,"gid"::text FROM province WHERE "geom" &&
       ST_GeomFromText('POLYGON((-5814679.36987815
       -1504714.04276694,-5814679.36987815 4439806.52253364,5943763.33635122
       4439806.52253364,5943763.33635122 -1504714.04276694,-5814679.36987815
       -1504714.04276694))',3978) LIMIT 2 OFFSET 0
      

      Note

      For one QGIS action, there could be several queries sent to MapServer, so also look for other “msPostGISLayerWhichShapes query:” instances in that error log.

    • now connect to that database through psql.exe

      psql -U postgres -p 5432 -d mydb
      
    • using that error file line, grab everything from “SELECT”, and inside your database prompt, start the command with “EXPLAIN ANALYZE” and then paste your full query, such as:

      mydb=# EXPLAIN ANALYZE SELECT
      "gid"::text,"area"::text,"perimeter"::text,"province_"::text,"province_i"::text,"status"::text,"name"::text,"name_e"::text,"name_f"::text,"reg_code"::text,"poly_featu"::text,"island"::text,"island_e"::text,"island_f"::text,ST_AsBinary(("geom"),'NDR')
      as geom,"gid"::text FROM province WHERE "geom" &&
      ST_GeomFromText('POLYGON((-5814679.36987815
      -1504714.04276694,-5814679.36987815 4439806.52253364,5943763.33635122
      4439806.52253364,5943763.33635122 -1504714.04276694,-5814679.36987815
      -1504714.04276694))',3978) LIMIT 2 OFFSET 0;
      
    • the response will tell you how long that query took, such as:

      Execution time: 0.293 ms
      
    • if the query takes a long time to execute at the psql commandline, then you know to focus your efforts on improving the indexing/settings of your PostGIS table.

Databases in General (PostGIS, Oracle, SpatiaLite, GeoPackage, Microsoft SQL Server, MySQL)

Enable Connection Pooling

By default, MapServer opens and closes a new database connection for each database-driven layer in the mapfile. If you have several layers reading from the same database, this doesn’t make a lot of sense. And with some databases (such as Oracle) establishing connections takes enough time that it can become significant.

Try adding this line to your database layers:

PROCESSING "CLOSE_CONNECTION=DEFER"

This causes MapServer to not close the database connection for each layer until after it has finished processing the mapfile and this may shave a few seconds off of map generation times.

Set EXTENT at the LAYER level of mapfile

Also for performance, each of your LAYERs with a database connection should have EXTENT set at the layer level (as well as ows_extent metadata if you are serving through OGC services), such as:

/* my database layer */
LAYER
  NAME "provinces"
  METADATA
    "wms_title" "Land"
    "wms_extent" "-2340603.75 -719746.0625 3009430.5 3836605.25" #this helps for performance
  END #metadata
  TYPE POLYGON
  STATUS ON
  CONNECTIONTYPE postgis
  CONNECTION "host=127.0.0.1 user=postgres password=postgres port=5432 dbname=gmap"
  DATA "geom FROM province USING unique gid using srid=3978"
  EXTENT -2340603.75 -719746.0625 3009430.5 3836605.25 #this helps for performance
  PROJECTION
    "init=epsg:3978"
  END # projection
  ...
END # layer

Tip

PostGIS users can use the ST_Extent() spatial function to get the bounding box of a table, such as

SELECT ST_Extent(geom) as table_extent FROM province;


              table_extent
----------------------------------------------------
    BOX(-2340603.75 -719746.0625,3009430.5 3836605.25)
(1 row)

Verify that your table has a Spatial Index

PostgreSQL/PostGIS example

Once connected through psql commandline, describe the table and look for an “Indexes” section, mentioning geometry, such as:

Indexes:
    "province_pkey" PRIMARY KEY, btree (ogc_fid)
    "province_wkb_geometry_geom_idx" gist (wkb_geometry)

GeoPackage example

You can use ogrinfo to verify that your GPKG table’s geometry has a spatial index, which is very important for performance (look for HasSpatialIndex (Integer) = 1 to confirm that the spatial index exists) :

ogrinfo -sql "SELECT HasSpatialIndex('countries', 'GEOMETRY')" countries.gpkg

  Layer name: SELECT
  Geometry: Unknown (any)
  Feature Count: 1
  Layer SRS WKT:
  (unknown)
  HasSpatialIndex: Integer (0.0)
  OGRFeature(SELECT):0
    HasSpatialIndex (Integer) = 1

and then add a spatial index with ogrinfo :

ogrinfo -sql "SELECT CreateSpatialIndex('parcelle_graphique', 'geom')" PARCELLES_GRAPHIQUES.gpkg

For WFS services, prevent using default/full extent

For your database connections, MapServer may (by default) use the full extents of your data while performing a query (which can cause slow responses >30seconds). You should likely set the following layer METADATA to disable this:

METADATA
  "wfs_use_default_extent_for_getfeature" "false"
END

Here is a fully enhanced layer example:

LAYER
  NAME "CLUS"
  METADATA
    "ows_title"                   "CLUS"
    "ows_srs"                     "EPSG:3857 EPSG:4326 EPSG:2154"
    "wfs_getfeature_formatlist"   "geojson"
    "wfs_geomtype"                "MultiPolygon"
    "gml_featureid"               "fid"
    "gml_include_items"           "all"
    "wfs_extent"                  "115134 6049690 1242200 7108930" #set to improve performance for all DB layers
    "wfs_use_default_extent_for_getfeature" "false" #set to improve performance for all DB layers
    "ows_enable_request"          "*"
  END #metadata
  TYPE POLYGON
  CONNECTIONTYPE OGR
  CONNECTION "C:/ms4w/apps/RPG_2-0_GPKG_LAMB93_FR-2019/PARCELLES_GRAPHIQUES.gpkg"
  DATA "parcelle_graphique"
  EXTENT 115134 6049690 1242200 7108930 #set to improve performance for all DB layers
  PROCESSING "CLOSE_CONNECTION=DEFER" #set to improve performance for all DB layers
  STATUS ON
  PROJECTION
    "init=epsg:2154"
  END #proj
  COMPOSITE
    OPACITY 100
  END #composite
  CLASS
    NAME "CLUS"
    STYLE
      OUTLINECOLOR 255 0 255
    END #style
  END #class
END #layer