mirror of
https://git.osgeo.org/gitea/postgis/postgis
synced 2024-10-25 09:32:46 +00:00
81143c5872
git-svn-id: http://svn.osgeo.org/postgis/trunk@2831 b70326c6-7e19-0410-871a-916f4a2858ee
140 lines
6.8 KiB
XML
140 lines
6.8 KiB
XML
<?xml version="1.0" encoding="UTF-8"?>
|
|
<chapter>
|
|
<title>Performance tips</title>
|
|
|
|
<sect1>
|
|
<title>Small tables of large geometries</title>
|
|
|
|
<sect2>
|
|
<title>Problem description</title>
|
|
|
|
<para>Current PostgreSQL versions (including 8.0) suffer from a query
|
|
optimizer weakness regarding TOAST tables. TOAST tables are a kind of
|
|
"extension room" used to store large (in the sense of data size) values
|
|
that do not fit into normal data pages (like long texts, images or
|
|
complex geometries with lots of vertices), see
|
|
http://www.postgresql.org/docs/8.0/static/storage-toast.html for more
|
|
information).</para>
|
|
|
|
<para>The problem appears if you happen to have a table with rather
|
|
large geometries, but not too much rows of them (like a table containing
|
|
the boundaries of all European countries in high resolution). Then the
|
|
table itself is small, but it uses lots of TOAST space. In our example
|
|
case, the table itself had about 80 rows and used only 3 data pages, but
|
|
the TOAST table used 8225 pages.</para>
|
|
|
|
<para>Now issue a query where you use the geometry operator &&
|
|
to search for a bounding box that matches only very few of those rows.
|
|
Now the query optimizer sees that the table has only 3 pages and 80
|
|
rows. He estimates that a sequential scan on such a small table is much
|
|
faster than using an index. And so he decides to ignore the GIST index.
|
|
Usually, this estimation is correct. But in our case, the &&
|
|
operator has to fetch every geometry from disk to compare the bounding
|
|
boxes, thus reading all TOAST pages, too.</para>
|
|
|
|
<para>To see whether your suffer from this bug, use the "EXPLAIN
|
|
ANALYZE" postgresql command. For more information and the technical
|
|
details, you can read the thread on the postgres performance mailing
|
|
list:
|
|
http://archives.postgresql.org/pgsql-performance/2005-02/msg00030.php</para>
|
|
</sect2>
|
|
|
|
<sect2>
|
|
<title>Workarounds</title>
|
|
|
|
<para>The PostgreSQL people are trying to solve this issue by making the
|
|
query estimation TOAST-aware. For now, here are two workarounds:</para>
|
|
|
|
<para>The first workaround is to force the query planner to use the
|
|
index. Send "SET enable_seqscan TO off;" to the server before issuing
|
|
the query. This basically forces the query planner to avoid sequential
|
|
scans whenever possible. So it uses the GIST index as usual. But this
|
|
flag has to be set on every connection, and it causes the query planner
|
|
to make misestimations in other cases, so you should "SET enable_seqscan
|
|
TO on;" after the query.</para>
|
|
|
|
<para>The second workaround is to make the sequential scan as fast as
|
|
the query planner thinks. This can be achieved by creating an additional
|
|
column that "caches" the bbox, and matching against this. In our
|
|
example, the commands are like:</para>
|
|
|
|
<programlisting>SELECT AddGeometryColumn('myschema','mytable','bbox','4326','GEOMETRY','2');
|
|
UPDATE mytable SET bbox = ST_Envelope(ST_Force_2d(the_geom));</programlisting>
|
|
|
|
<para>Now change your query to use the && operator against bbox
|
|
instead of geom_column, like:</para>
|
|
|
|
<programlisting>SELECT geom_column
|
|
FROM mytable
|
|
WHERE bbox && ST_SetSRID('BOX3D(0 0,1 1)'::box3d,4326);</programlisting>
|
|
|
|
<para>Of course, if you change or add rows to mytable, you have to keep
|
|
the bbox "in sync". The most transparent way to do this would be
|
|
triggers, but you also can modify your application to keep the bbox
|
|
column current or run the UPDATE query above after every
|
|
modification.</para>
|
|
</sect2>
|
|
</sect1>
|
|
|
|
<sect1>
|
|
<title>CLUSTERing on geometry indices</title>
|
|
|
|
<para>For tables that are mostly read-only, and where a single index is
|
|
used for the majority of queries, PostgreSQL offers the CLUSTER command.
|
|
This command physically reorders all the data rows in the same order as
|
|
the index criteria, yielding two performance advantages: First, for index
|
|
range scans, the number of seeks on the data table is drastically reduced.
|
|
Second, if your working set concentrates to some small intervals on the
|
|
indices, you have a more efficient caching because the data rows are
|
|
spread along fewer data pages. (Feel invited to read the CLUSTER command
|
|
documentation from the PostgreSQL manual at this point.)</para>
|
|
|
|
<para>However, currently PostgreSQL does not allow clustering on PostGIS
|
|
GIST indices because GIST indices simply ignores NULL values, you get an
|
|
error message like:</para>
|
|
|
|
<programlisting>lwgeom=# CLUSTER my_geom_index ON my_table;
|
|
ERROR: cannot cluster when index access method does not handle null values
|
|
HINT: You may be able to work around this by marking column "the_geom" NOT NULL.</programlisting>
|
|
|
|
<para>As the HINT message tells you, one can work around this deficiency
|
|
by adding a "not null" constraint to the table:</para>
|
|
|
|
<programlisting>lwgeom=# ALTER TABLE my_table ALTER COLUMN the_geom SET not null;
|
|
ALTER TABLE</programlisting>
|
|
|
|
<para>Of course, this will not work if you in fact need NULL values in
|
|
your geometry column. Additionally, you must use the above method to add
|
|
the constraint, using a CHECK constraint like "ALTER TABLE blubb ADD CHECK
|
|
(geometry is not null);" will not work.</para>
|
|
</sect1>
|
|
|
|
<sect1>
|
|
<title>Avoiding dimension conversion</title>
|
|
|
|
<para>Sometimes, you happen to have 3D or 4D data in your table, but
|
|
always access it using OpenGIS compliant ST_AsText() or ST_AsBinary()
|
|
functions that only output 2D geometries. They do this by internally
|
|
calling the ST_Force_2d() function, which introduces a significant
|
|
overhead for large geometries. To avoid this overhead, it may be feasible
|
|
to pre-drop those additional dimensions once and forever:</para>
|
|
|
|
<programlisting>UPDATE mytable SET the_geom = ST_Force_2d(the_geom);
|
|
VACUUM FULL ANALYZE mytable;</programlisting>
|
|
|
|
<para>Note that if you added your geometry column using
|
|
AddGeometryColumn() there'll be a constraint on geometry dimension. To
|
|
bypass it you will need to drop the constraint. Remember to update the
|
|
entry in the geometry_columns table and recreate the constraint
|
|
afterwards.</para>
|
|
|
|
<para>In case of large tables, it may be wise to divide this UPDATE into
|
|
smaller portions by constraining the UPDATE to a part of the table via a
|
|
WHERE clause and your primary key or another feasible criteria, and
|
|
running a simple "VACUUM;" between your UPDATEs. This drastically reduces
|
|
the need for temporary disk space. Additionally, if you have mixed
|
|
dimension geometries, restricting the UPDATE by "WHERE
|
|
dimension(the_geom)>2" skips re-writing of geometries that already are
|
|
in 2D.</para>
|
|
</sect1>
|
|
</chapter> |