Skip to content

Commit 4ff37cd

Browse files
committed
Document fulltext search with trigram backend
1 parent 144a94b commit 4ff37cd

File tree

1 file changed

+99
-55
lines changed

1 file changed

+99
-55
lines changed

src/topics/Search.md

Lines changed: 99 additions & 55 deletions
Original file line numberDiff line numberDiff line change
@@ -207,12 +207,94 @@ In addition to the configuration described above, you can specify these addition
207207

208208
## Configuring the fulltext search service <a name="fulltext-search"></a>
209209

210-
### Solr configuration
210+
The [`qwc-fulltet-search-service`](https://github.com/qwc-services/qwc-fulltet-search-service) provides facetted fullsearch text search, with one of the following backends:
211211

212-
Before the fulltext search service can be configured, a new solr configuration file must be created.
213-
This file must be created in `volumes/solr/configsets/gdi/conf/`.
214-
The name of the file can be chosen freely.
215-
Here is an example XML file:
212+
* Postgres Trigram
213+
* Apache Solr
214+
215+
A facet references a searchable dataset. The configuration of the fulltext search service and available search facets can be found in `tenantConfig.json`:
216+
217+
```json
218+
{
219+
"name": "search",
220+
"config": {
221+
"search_backend": "<solr|trgm>",
222+
"word_split_re": "[\\s,.:;\"]+",
223+
"search_result_limit": 50,
224+
"db_url": "postgresql:///?service=qwc_geodb",
225+
// trgm specific configuration, see below
226+
"trgm_feature_query": "<see below>",
227+
"trgm_layer_query": "<see below>",
228+
"trgm_similarity_threshold": "0.3"
229+
// solr specific configuration, see below
230+
"solr_service_url": "http://localhost:8983/solr/gdi/select",
231+
"search_result_sort": "score desc, sort asc",
232+
},
233+
"resources": {
234+
"facets": [
235+
{
236+
"name": "<facet name>",
237+
"filter_word": "<filter word>",
238+
"table_name": "<schema.tablename>",
239+
"geometry_column": "<geometry column name>"
240+
},
241+
...
242+
]
243+
}
244+
}
245+
```
246+
247+
- The `search_backend` specifies the search backend to use, either `solr` or `trgm`. Default: `solr`.
248+
- The `db_url` specifies the DB which contains the search index (searched either by `solr` or by the specified `trgm` queries).
249+
- The `word_split_re` specifies the regular expression which is used to split the search string into single words. Default: `[\\s,.:;\"]+`.
250+
- `search_result_limit` specifies the maximum number of feature results returned by a search. Default: `50`.
251+
252+
The facets describe a searchable dataset and are referenced by the search index:
253+
254+
- `name` specifies the facet identifier.
255+
- `filter_word` is a short (human readable) name which appears as result category in the search results (i.e. `Address`).
256+
- `table_name` specifies the table containing the features referenced by the search index (in the format `schema.table_name`).
257+
- `geometry_column` specifies the name of the geometry column in this table.
258+
259+
260+
### Fulltext search with Trigram backend
261+
262+
To configure a fulltext search with the trigram backend, set `search_backend` to `trgm` and specify a `trgm_feature_query` and optionally a `trgm_layer_query`. The feature and layer query SQL can contain following placeholders:
263+
264+
- `:term`: The full search text
265+
- `:terms`: A list of search text words (i.e. the full search text split by whitespace).
266+
- `:thres`: The trigram similarity treshold value (note that the service will also separately execute `SET pg_trgm.similarity_threshold = <value>`)
267+
268+
The `trgm_feature_query` must return the following fields:
269+
270+
* `display`: The label to display in the search results.
271+
* `facet_id`: The facet name (as configured in `resources` => `facets`).
272+
* `id_field_name`: The name of the identifier field in the table referenced by the facet.
273+
* `feature_id`: The feature identifier through which to locate the feature in table referenced by the facet.
274+
* `bbox`: The feature bounding box, as a`[xmin,ymin,xmax,ymax]` string.
275+
* `srid`: The SRID of the bbox coordinates (i.e. `3857`).
276+
277+
Example:
278+
279+
SELECT display, facet_id, id_field_name, feature_id, bbox, srid, similarity(suchbegriffe, :term) sml
280+
FROM public.search_index WHERE searchterms % :term OR searchterms ILIKE '%' || :term || '%' ORDER BY sml DESC;",
281+
282+
The `trgm_layer_query` must return the following fields:
283+
284+
* `display`: The label to display in the search results.
285+
* `dataproduct_id`: The id of the dataproduct.
286+
* `has_info`: Whether an abstract is available for the dataproduct.
287+
* `sublayers`: A JSON stringified array of the shape `[{"ident": "<dataproduct_id>", "display": "<display>", "dset_info": true}, ...]`, or `NULL` if no sublayers exist.
288+
289+
*Note*: The layer query relies on an additional service, configured as `dataproductServiceUrl` in the viewer `config.json`, which resolves the `dataproduct_id` to a QWC theme sublayer object, like the [`sogis-dataproduct-service`](https://github.com/qwc-services/sogis-dataproduct-service).
290+
291+
*Note*: Set `FLASK_DEBUG=1` as environment variable for the search service to see additional logging information.
292+
293+
### Fulltext search with Solr backend
294+
295+
To use the solr backend, you need to run a solr search service and point `solr_service_url` to the corresponding URL. You can find the solr documentation at [https://lucene.apache.org/solr/guide/8_0/](https://lucene.apache.org/solr/guide/8_0/).
296+
297+
Next, create search XML configuration files in `volumes/solr/configsets/gdi/conf/`. The name of the file can be chosen freely. Example:
216298

217299
```xml
218300
<dataConfig>
@@ -223,11 +305,11 @@ Here is an example XML file:
223305
password="{DB_PASSWORD}"
224306
/>
225307
<document>
226-
<entity name="{SEARCH_NAME}" query="
308+
<entity name="{FACET_NAME}" query="
227309
WITH index_base AS (
228310
/* ==== Base query for search index ==== */
229311
SELECT
230-
'{SEARCH_NAME}'::text AS subclass,
312+
'{FACET_NAME}'::text AS subclass,
231313
{PRIMARY_KEY} AS id_in_class,
232314
'{PRIMARY_KEY}' AS id_name,
233315
'{SEARCH_FIELD_DATA_TYPE}:n' AS id_type,
@@ -262,7 +344,7 @@ The next table shows how the values need to be defined:
262344
| `DB_PORT` | Database port number | `5432` |
263345
| `DB_USER` | Database username | `qwc_service` |
264346
| `DB_PASSWORD` | Password for the specified database user | `qwc_service` |
265-
| `SEARCH_NAME` | Name of the search | `fluesse_search` |
347+
| `FACET_NAME` | Name of the search facet | `fluesse_search` |
266348
| `PRIMARY_KEY` | Primary key name of the table that is used in the search query | `ogc_fid` |
267349
| `SEARCH_FIELD_DATA_TYPE` | Search field data type | `str` |
268350
| `DISPLAYTEXT` | Displaytext that will be shown by the QWC2 when a match was found | `name_long` |
@@ -281,7 +363,7 @@ In the `volumes/solr/configsets/gdi/conf/solrconfig.xml` file you have to look f
281363
`<!-- SearchHandler` and add the following configuration
282364

283365
```xml
284-
<requestHandler name="/SEARCH_NAME" class="solr.DataImportHandler">
366+
<requestHandler name="/FACET_NAME" class="solr.DataImportHandler">
285367
<lst name="defaults">
286368
<str name="config">NAME_OF_THE_CONFIGURATION_FILE.xml</str>
287369
</lst>
@@ -293,57 +375,23 @@ Finally, the `solr` index has to be generated:
293375
```
294376
rm -rf volumes/solr/data/*
295377
docker compose restart qwc-solr
296-
curl 'http://localhost:8983/solr/gdi/SEARCH_NAME?command=full-import'
297-
```
298-
299-
### Configure fulltext service
300-
301-
The configuration of the fulltext search service can be found in `tenantConfig.json`.
302-
Search the `services` list for the JSON object that has `search` as its name.
303-
Then add a new facet to the facets list. An example entry could be:
304-
305-
```json
306-
{
307-
"name": "search",
308-
"config": {
309-
"solr_service_url": "http://qwc-solr:8983/solr/gdi/select",
310-
"search_result_limit": 50,
311-
"db_url": "postgresql:///?service=qwc_geodb"
312-
},
313-
"resources": {
314-
"facets": [
315-
{
316-
"name": "SEARCH_NAME",
317-
"filter_word": "OPTIONAL_SEARCH_FILTER",
318-
"table_name": "SCHEMA.SEARCH_TABLE_NAME",
319-
"geometry_column": "GEOMETRY_FIELD",
320-
"search_id_col": "PRIMARY_KEY"
321-
}
322-
]
323-
}
324-
}
378+
curl 'http://localhost:8983/solr/gdi/FACET_NAME?command=full-import'
325379
```
326380

327-
The `filter_word` field can be specified to activate / deactivate searches,
328-
if you have configure multiple searches for one theme.
329-
Normally `filter_word` is left empty (`""`) which results in the search always
330-
being active.
331-
But if specified (e.g. `"house_no"`) then the fulltext search will only use
332-
the configured search, if the user prefixes his search text with `"house_no:"`.
333381

334-
### Activate search for a theme
382+
### Configuring the search for a theme
335383

336-
As a final step, you have to configure the search for the desired themes and give the users the necessary rights in the Admin GUI.
384+
Finally, configure the search facets for the desired themes and give the users the necessary rights in the Admin GUI.
337385

338386
1. Add the following to a theme item in `themesConfig.json`:
339387

340388
```json
341389
"searchProviders": [
342390
{
343391
"provider": "solr",
344-
"default": [<SEARCH_NAME>],
392+
"default": [<FACET_NAME>],
345393
"layers": {
346-
"<layer_name>": "<SEARCH_NAME>"
394+
"<layer_name>": "<FACET_NAME>"
347395
}
348396
}
349397
]
@@ -354,14 +402,10 @@ When activating a search to a theme, you can either:
354402
* Add the search name to the `default` list, resulting in the search always being active.
355403
* Add the search name to the `layers` object, resulting in the search being active only if the theme layer `<layer_name>` is present in the theme.
356404

357-
2. Create a new resource in the Admin GUI
405+
2. Create `Search facet` resources in the Admin GUI for the desired facet names.
358406

359-
![Create resource](../images/create_solr_search_facet_resource.png?style=centerme)
407+
* To manage layer search permissions, you can create a `Search facet` with name `dataproduct`.
408+
* You can create a wildcard `Search facet` resource by setting the name to `*`. This is useful to assign permissions for all available facets with one single resource.
360409

361410
3. Add permissions on the newly created resource
362-
363-
![Add permission](../images/add_solr_search_facet_permission.png?style=centerme)
364-
365411
4. Re-generate the services configurations with the `Generate service configuration` button
366-
367-
![Generate service configurations](../images/generate_service_configurations.png?style=centerme)

0 commit comments

Comments
 (0)