mirror of
https://git.osgeo.org/gitea/postgis/postgis
synced 2024-10-24 09:02:37 +00:00
e1d253e2eb
git-svn-id: http://svn.osgeo.org/postgis/trunk@9067 b70326c6-7e19-0410-871a-916f4a2858ee
289 lines
12 KiB
XML
289 lines
12 KiB
XML
<?xml version="1.0" encoding="UTF-8"?>
|
|
<chapter>
|
|
<title>Performance tips</title>
|
|
|
|
<sect1>
|
|
<title>Small tables of large geometries</title>
|
|
|
|
<sect2>
|
|
<title>Problem description</title>
|
|
|
|
<para>Current PostgreSQL versions (including 8.0) suffer from a query
|
|
optimizer weakness regarding TOAST tables. TOAST tables are a kind of
|
|
"extension room" used to store large (in the sense of data size) values
|
|
that do not fit into normal data pages (like long texts, images or
|
|
complex geometries with lots of vertices), see
|
|
<ulink url="http://www.postgresql.org/docs/current/static/storage-toast.html">the PostgreSQL Documentation for TOAST</ulink> for more
|
|
information).</para>
|
|
|
|
<para>The problem appears if you happen to have a table with rather
|
|
large geometries, but not too much rows of them (like a table containing
|
|
the boundaries of all European countries in high resolution). Then the
|
|
table itself is small, but it uses lots of TOAST space. In our example
|
|
case, the table itself had about 80 rows and used only 3 data pages, but
|
|
the TOAST table used 8225 pages.</para>
|
|
|
|
<para>Now issue a query where you use the geometry operator &&
|
|
to search for a bounding box that matches only very few of those rows.
|
|
Now the query optimizer sees that the table has only 3 pages and 80
|
|
rows. He estimates that a sequential scan on such a small table is much
|
|
faster than using an index. And so he decides to ignore the GIST index.
|
|
Usually, this estimation is correct. But in our case, the &&
|
|
operator has to fetch every geometry from disk to compare the bounding
|
|
boxes, thus reading all TOAST pages, too.</para>
|
|
|
|
<para>To see whether your suffer from this bug, use the "EXPLAIN
|
|
ANALYZE" postgresql command. For more information and the technical
|
|
details, you can read the thread on the postgres performance mailing
|
|
list:
|
|
http://archives.postgresql.org/pgsql-performance/2005-02/msg00030.php</para>
|
|
</sect2>
|
|
|
|
<sect2>
|
|
<title>Workarounds</title>
|
|
|
|
<para>The PostgreSQL people are trying to solve this issue by making the
|
|
query estimation TOAST-aware. For now, here are two workarounds:</para>
|
|
|
|
<para>The first workaround is to force the query planner to use the
|
|
index. Send "SET enable_seqscan TO off;" to the server before issuing
|
|
the query. This basically forces the query planner to avoid sequential
|
|
scans whenever possible. So it uses the GIST index as usual. But this
|
|
flag has to be set on every connection, and it causes the query planner
|
|
to make misestimations in other cases, so you should "SET enable_seqscan
|
|
TO on;" after the query.</para>
|
|
|
|
<para>The second workaround is to make the sequential scan as fast as
|
|
the query planner thinks. This can be achieved by creating an additional
|
|
column that "caches" the bbox, and matching against this. In our
|
|
example, the commands are like:</para>
|
|
|
|
<programlisting>SELECT AddGeometryColumn('myschema','mytable','bbox','4326','GEOMETRY','2');
|
|
UPDATE mytable SET bbox = ST_Envelope(ST_Force_2d(the_geom));</programlisting>
|
|
|
|
<para>Now change your query to use the && operator against bbox
|
|
instead of geom_column, like:</para>
|
|
|
|
<programlisting>SELECT geom_column
|
|
FROM mytable
|
|
WHERE bbox && ST_SetSRID('BOX3D(0 0,1 1)'::box3d,4326);</programlisting>
|
|
|
|
<para>Of course, if you change or add rows to mytable, you have to keep
|
|
the bbox "in sync". The most transparent way to do this would be
|
|
triggers, but you also can modify your application to keep the bbox
|
|
column current or run the UPDATE query above after every
|
|
modification.</para>
|
|
</sect2>
|
|
</sect1>
|
|
|
|
<sect1>
|
|
<title>CLUSTERing on geometry indices</title>
|
|
|
|
<para>For tables that are mostly read-only, and where a single index is
|
|
used for the majority of queries, PostgreSQL offers the CLUSTER command.
|
|
This command physically reorders all the data rows in the same order as
|
|
the index criteria, yielding two performance advantages: First, for index
|
|
range scans, the number of seeks on the data table is drastically reduced.
|
|
Second, if your working set concentrates to some small intervals on the
|
|
indices, you have a more efficient caching because the data rows are
|
|
spread along fewer data pages. (Feel invited to read the CLUSTER command
|
|
documentation from the PostgreSQL manual at this point.)</para>
|
|
|
|
<para>However, currently PostgreSQL does not allow clustering on PostGIS
|
|
GIST indices because GIST indices simply ignores NULL values, you get an
|
|
error message like:</para>
|
|
|
|
<programlisting>lwgeom=# CLUSTER my_geom_index ON my_table;
|
|
ERROR: cannot cluster when index access method does not handle null values
|
|
HINT: You may be able to work around this by marking column "the_geom" NOT NULL.</programlisting>
|
|
|
|
<para>As the HINT message tells you, one can work around this deficiency
|
|
by adding a "not null" constraint to the table:</para>
|
|
|
|
<programlisting>lwgeom=# ALTER TABLE my_table ALTER COLUMN the_geom SET not null;
|
|
ALTER TABLE</programlisting>
|
|
|
|
<para>Of course, this will not work if you in fact need NULL values in
|
|
your geometry column. Additionally, you must use the above method to add
|
|
the constraint, using a CHECK constraint like "ALTER TABLE blubb ADD CHECK
|
|
(geometry is not null);" will not work.</para>
|
|
</sect1>
|
|
|
|
<sect1>
|
|
<title>Avoiding dimension conversion</title>
|
|
|
|
<para>Sometimes, you happen to have 3D or 4D data in your table, but
|
|
always access it using OpenGIS compliant ST_AsText() or ST_AsBinary()
|
|
functions that only output 2D geometries. They do this by internally
|
|
calling the ST_Force_2d() function, which introduces a significant
|
|
overhead for large geometries. To avoid this overhead, it may be feasible
|
|
to pre-drop those additional dimensions once and forever:</para>
|
|
|
|
<programlisting>UPDATE mytable SET the_geom = ST_Force_2d(the_geom);
|
|
VACUUM FULL ANALYZE mytable;</programlisting>
|
|
|
|
<para>Note that if you added your geometry column using
|
|
AddGeometryColumn() there'll be a constraint on geometry dimension. To
|
|
bypass it you will need to drop the constraint. Remember to update the
|
|
entry in the geometry_columns table and recreate the constraint
|
|
afterwards.</para>
|
|
|
|
<para>In case of large tables, it may be wise to divide this UPDATE into
|
|
smaller portions by constraining the UPDATE to a part of the table via a
|
|
WHERE clause and your primary key or another feasible criteria, and
|
|
running a simple "VACUUM;" between your UPDATEs. This drastically reduces
|
|
the need for temporary disk space. Additionally, if you have mixed
|
|
dimension geometries, restricting the UPDATE by "WHERE
|
|
dimension(the_geom)>2" skips re-writing of geometries that already are
|
|
in 2D.</para>
|
|
|
|
</sect1>
|
|
|
|
<sect1>
|
|
<title>Tuning your configuration</title>
|
|
|
|
<para>These tips are taken from Kevin Neufeld's presentation "Tips for the
|
|
PostGIS Power User" at the FOSS4G 2007 conference. Depending on your
|
|
use of PostGIS (for example, static data and complex analysis vs frequently
|
|
updated data and lots of users) these changes can provide significant
|
|
speedups to your queries.</para>
|
|
|
|
<para>For a more tips (and better formatting), the original presentation
|
|
is at
|
|
<ulink url="http://2007.foss4g.org/presentations/view.php?abstract_id=117">
|
|
http://2007.foss4g.org/presentations/view.php?abstract_id=117</ulink>.
|
|
</para>
|
|
|
|
<sect2>
|
|
<title>Startup</title>
|
|
|
|
<para>
|
|
These settings are configured in postgresql.conf:
|
|
</para>
|
|
|
|
<para>
|
|
<ulink url="http://www.postgresql.org/docs/current/static/runtime-config-wal.html#GUC-CHECKPOINT-SEGMENTS">checkpoint_segments</ulink>
|
|
</para>
|
|
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Maximum number of log file segments between automatic WAL checkpoints
|
|
(each segment is normally 16MB); default is 3
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Set to at least 10 or 30 for databases with heavy write activity, or
|
|
more for large database loads. Another article on the topic worth reading <ulink url="http://www.westnet.com/~gsmith/content/postgresql/chkp-bgw-83.htm">Greg Smith: Checkpoint and Background writer</ulink>
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Possibly store the xlog on a separate disk device
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
|
|
<para>
|
|
<ulink url="http://www.postgresql.org/docs/current/static/runtime-config-query.html#GUC-CONSTRAINT-EXCLUSION">constraint_exclusion</ulink>
|
|
</para>
|
|
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Default: off (prior to PostgreSQL 8.4 and for PostgreSQL 8.4+ is set to partition)
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
This is generally used for table partitioning. If you are running PostgreSQL versions below 8.4, set to "on" to ensure the query planner will optimize as desired.
|
|
As of PostgreSQL 8.4, the default for this is set to "partition" which is ideal for PostgreSQL 8.4 and above since
|
|
it will force the planner to only analyze tables for constraint consideration if they are in an inherited hierarchy
|
|
and not pay the planner penalty otherwise.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
|
|
<para>
|
|
<ulink url="http://www.postgresql.org/docs/current/static/runtime-config-resource.html#GUC-SHARED-BUFFERS">shared_buffers</ulink>
|
|
</para>
|
|
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Default: ~32MB
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Set to about 1/3 to 3/4 of available RAM
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</sect2>
|
|
|
|
<sect2>
|
|
<title>Runtime</title>
|
|
|
|
<para>
|
|
<ulink url="http://www.postgresql.org/docs/current/static/runtime-config-resource.html#GUC-WORK-MEM">work_mem</ulink> (the memory used for sort operations and complex queries)
|
|
</para>
|
|
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Default: 1MB
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Adjust up for large dbs, complex queries, lots of RAM
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Adjust down for many concurrent users or low RAM.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
If you have lots of RAM and few developers:
|
|
<programlisting>
|
|
SET work_mem TO 1200000;
|
|
</programlisting>
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
|
|
<para>
|
|
<ulink url="http://www.postgresql.org/docs/current/static/runtime-config-resource.html#GUC-MAINTENANCE-WORK-MEM">maintenance_work_mem</ulink> (used for VACUUM, CREATE INDEX, etc.)
|
|
</para>
|
|
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Default: 16MB
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Generally too low - ties up I/O, locks objects while swapping memory
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Recommend 32MB to 256MB on production servers w/lots of RAM, but depends
|
|
on the # of concurrent users. If you have lots of RAM and few developers:
|
|
<programlisting>
|
|
SET maintainence_work_mem TO 1200000;
|
|
</programlisting>
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
|
|
</sect2>
|
|
|
|
</sect1>
|
|
</chapter>
|