Solr Tutorial

Solr教程

This tutorial covers getting Solr up and running, ingesting a variety of data sources into Solr collections, and getting a feel for the Solr administrative and search interfaces.

本教程覆盖了Solr的启动和运行,将各种数据源导入Solr集合,并获得Solr管理和搜索接口的感觉。

The tutorial is organized into three sections that each build on the one before it. The first exercise will ask you to start Solr, create a collection, index some basic documents, and then perform some searches.

本教程分为三个部分,每个部分都在其前面构建。第一个练习将要求您开始Solr,创建一个集合,索引一些基本文档,然后执行一些搜索。

The second exercise works with a different set of data, and explores requesting facets with the dataset.

第二个练习使用不同的数据集,并通过数据集探索请求方面。

The third exercise encourages you to begin to work with your own data and start a plan for your implementation.

第三个练习鼓励您开始使用您自己的数据,并为您的实现启动一个计划。

Finally, we’ll introduce spatial search and show you how to get your Solr instance back into a clean state.

最后,我们将介绍空间搜索,并向您展示如何将Solr实例返回到干净的状态。

Before You Begin

在你开始之前

To follow along with this tutorial, you will need…​

要跟随本教程,您将需要…

  1. To meet the system requirements

    满足系统要求。

  2. An Apache Solr release download. This tutorial is designed for Apache Solr 7.3.

    一个Apache Solr版本下载。本教程是为Apache solr7.3设计的。

For best results, please run the browser showing this tutorial and the Solr server on the same machine so tutorial links will correctly point to your Solr server.

为了获得最好的结果,请运行浏览器显示本教程和在同一台机器上的Solr服务器,因此教程链接将正确地指向您的Solr服务器。

Unpack Solr

解压缩Solr

Begin by unzipping the Solr release and changing your working directory to the subdirectory where Solr was installed. For example, with a shell in UNIX, Cygwin, or MacOS:

从解压缩Solr版本开始,并将工作目录更改到安装Solr的子目录。例如,在UNIX、Cygwin或MacOS中使用shell:

~$ ls solr*
solr-7.3.0.zip

~$ unzip -q solr-7.3.0.zip

~$ cd solr-7.3.0/

If you’d like to know more about Solr’s directory layout before moving to the first exercise, see the section Directory Layout for details.

如果您想了解更多关于Solr的目录布局,然后再进行第一次练习,请参阅节目录布局以了解详细信息。

Exercise 1: Index Techproducts Example Data

练习1:索引技术产品示例数据。

This exercise will walk you through how to start Solr as a two-node cluster (both nodes on the same machine) and create a collection during startup. Then you will index some sample data that ships with Solr and do some basic searches.

这个练习将指导您如何将Solr作为一个双节点集群(两个节点都在同一台机器上),并在启动时创建一个集合。然后,您将索引一些带有Solr的示例数据,并进行一些基本的搜索。

Launch Solr in SolrCloud Mode

在SolrCloud模式下启动Solr。

To launch Solr, run: bin/solr start -e cloud on Unix or MacOS; bin\solr.cmd start -e cloud on Windows.

启动Solr,运行:bin/ Solr在Unix或MacOS上启动-e云;本\ solr。cmd在Windows上启动-e云。

This will start an interactive session that will start two Solr "servers" on your machine. This command has an option to run without prompting you for input (-noprompt), but we want to modify two of the defaults so we won’t use that option now.

这将启动一个交互式会话,在您的机器上启动两个Solr“服务器”。这个命令可以在没有提示输入的情况下运行(-noprompt),但是我们希望修改两个默认值,这样我们就不会使用这个选项了。

solr-7.3.0:$ ./bin/solr start -e cloud

Welcome to the SolrCloud example!

This interactive session will help you launch a SolrCloud cluster on your local workstation.
To begin, how many Solr nodes would you like to run in your local cluster? (specify 1-4 nodes) [2]:

The first prompt asks how many nodes we want to run. Note the [2] at the end of the last line; that is the default number of nodes. Two is what we want for this example, so you can simply press enter.

第一个提示询问我们想要运行多少节点。注意最后一行末尾的[2];这是节点的默认数量。两个是我们想要的例子,所以你可以按回车。

Ok, let's start up 2 Solr nodes for your example SolrCloud cluster.
Please enter the port for node1 [8983]:

This will be the port that the first node runs on. Unless you know you have something else running on port 8983 on your machine, accept this default option also by pressing enter. If something is already using that port, you will be asked to choose another port.

这将是第一个节点运行的端口。除非您知道您的机器上有其他运行在8983端口上的东西,否则请按enter键接受这个默认选项。如果某些东西已经在使用该端口,那么您将被要求选择另一个端口。

Please enter the port for node2 [7574]:

This is the port the second node will run on. Again, unless you know you have something else running on port 8983 on your machine, accept this default option also by pressing enter. If something is already using that port, you will be asked to choose another port.

这是第二个节点将运行的端口。同样,除非您知道在您的机器上有其他运行在8983端口上的东西,否则也要按enter键接受这个默认选项。如果某些东西已经在使用该端口,那么您将被要求选择另一个端口。

Solr will now initialize itself and start running on those two nodes. The script will print the commands it uses for your reference.

Solr现在将自己初始化并在这两个节点上运行。该脚本将打印它用于您的引用的命令。

Starting up 2 Solr nodes for your example SolrCloud cluster.

Creating Solr home directory /solr-7.3.0/example/cloud/node1/solr
Cloning /solr-7.3.0/example/cloud/node1 into
   /solr-7.3.0/example/cloud/node2

Starting up Solr on port 8983 using command:
"bin/solr" start -cloud -p 8983 -s "example/cloud/node1/solr"

Waiting up to 180 seconds to see Solr running on port 8983 [\]
Started Solr server on port 8983 (pid=34942). Happy searching!


Starting up Solr on port 7574 using command:
"bin/solr" start -cloud -p 7574 -s "example/cloud/node2/solr" -z localhost:9983

Waiting up to 180 seconds to see Solr running on port 7574 [\]
Started Solr server on port 7574 (pid=35036). Happy searching!

INFO  - 2017-07-27 12:28:02.835; org.apache.solr.client.solrj.impl.ZkClientClusterStateProvider; Cluster at localhost:9983 ready

Notice that two instances of Solr have started on two nodes. Because we are starting in SolrCloud mode, and did not define any details about an external ZooKeeper cluster, Solr launches its own ZooKeeper and connects both nodes to it.

注意,Solr的两个实例已经在两个节点上启动。因为我们从SolrCloud模式开始,并没有定义任何关于外部ZooKeeper集群的细节,Solr启动了自己的ZooKeeper,并将两个节点连接到它。

After startup is complete, you’ll be prompted to create a collection to use for indexing data.

启动完成后,将提示您创建用于索引数据的集合。

Now let's create a new collection for indexing documents in your 2-node cluster.
Please provide a name for your new collection: [gettingstarted]

Here’s the first place where we’ll deviate from the default options. This tutorial will ask you to index some sample data included with Solr, called the "techproducts" data. Let’s name our collection "techproducts" so it’s easy to differentiate from other collections we’ll create later. Enter techproducts at the prompt and hit enter.

这是第一个我们会偏离默认选项的地方。本教程将要求您对包含Solr的一些示例数据进行索引,称为“techproducts”数据。让我们将我们的集合命名为“techproducts”,以便与稍后创建的其他集合区分开来。在提示符处输入techproducts并按回车键。

How many shards would you like to split techproducts into? [2]

This is asking how many shards you want to split your index into across the two nodes. Choosing "2" (the default) means we will split the index relatively evenly across both nodes, which is a good way to start. Accept the default by hitting enter.

这是问你想要将索引分割成两个节点有多少个碎片。选择“2”(默认)意味着我们将在两个节点上相对均匀地分割索引,这是一个很好的开始。按回车键接受默认值。

How many replicas per shard would you like to create? [2]

A replica is a copy of the index that’s used for failover (see also the Solr Glossary definition). Again, the default of "2" is fine to start with here also, so accept the default by hitting enter.

副本是用于故障转移的索引的副本(参见Solr术语表定义)。同样,“2”的默认值也可以从这里开始,因此通过按回车来接受默认值。

Please choose a configuration for the techproducts collection, available options are:
_default or sample_techproducts_configs [_default]

We’ve reached another point where we will deviate from the default option. Solr has two sample sets of configuration files (called a configSet) available out-of-the-box.

我们已经到达了另一个我们将偏离默认选项的点。Solr有两个配置文件的示例集(称为configSet)。

A collection must have a configSet, which at a minimum includes the two main configuration files for Solr: the schema file (named either managed-schema or schema.xml), and solrconfig.xml. The question here is which configSet you would like to start with. The _default is a bare-bones option, but note there’s one whose name includes "techproducts", the same as we named our collection. This configSet is specifically designed to support the sample data we want to use, so enter sample_techproducts_configs at the prompt and hit enter.

集合必须有一个configSet,它至少包含Solr的两个主要配置文件:架构文件(命名为manageschema或schema.xml)和solrconfig.xml。这里的问题是您希望从哪个configSet开始。_default是一个简单的选项,但请注意,其中有一个名称包含“techproducts”,就像我们命名我们的集合一样。这个configSet是专门用来支持我们想要使用的示例数据的,所以在提示符处输入sample_techproducts_configs,然后按回车键。

At this point, Solr will create the collection and again output to the screen the commands it issues.

此时,Solr将创建集合,并再次将其输出到屏幕上。

Uploading /solr-7.3.0/server/solr/configsets/_default/conf for config techproducts to ZooKeeper at localhost:9983

Connecting to ZooKeeper at localhost:9983 ...
INFO  - 2017-07-27 12:48:59.289; org.apache.solr.client.solrj.impl.ZkClientClusterStateProvider; Cluster at localhost:9983 ready
Uploading /solr-7.3.0/server/solr/configsets/sample_techproducts_configs/conf for config techproducts to ZooKeeper at localhost:9983

Creating new collection 'techproducts' using command:
http://localhost:8983/solr/admin/collections?action=CREATE&name=techproducts&numShards=2&replicationFactor=2&maxShardsPerNode=2&collection.configName=techproducts

{
  "responseHeader":{
    "status":0,
    "QTime":5460},
  "success":{
    "192.168.0.110:7574_solr":{
      "responseHeader":{
        "status":0,
        "QTime":4056},
      "core":"techproducts_shard1_replica_n1"},
    "192.168.0.110:8983_solr":{
      "responseHeader":{
        "status":0,
        "QTime":4056},
      "core":"techproducts_shard2_replica_n2"}}}

Enabling auto soft-commits with maxTime 3 secs using the Config API

POSTing request to Config API: http://localhost:8983/solr/techproducts/config
{"set-property":{"updateHandler.autoSoftCommit.maxTime":"3000"}}
Successfully set-property updateHandler.autoSoftCommit.maxTime to 3000

SolrCloud example running, please visit: http://localhost:8983/solr

Congratulations! Solr is ready for data!

恭喜你!Solr已经准备好数据了!

You can see that Solr is running by launching the Solr Admin UI in your web browser: http://localhost:8983/solr/. This is the main starting point for administering Solr.

通过在web浏览器中启动Solr管理UI,可以看到Solr正在运行:http://localhost:8983/ Solr /。这是管理Solr的主要出发点。

Solr will now be running two "nodes", one on port 7574 and one on port 8983. There is one collection created automatically, techproducts, a two shard collection, each with two replicas.

Solr现在将运行两个“节点”,一个在7574端口上,一个在8983端口上。有一个集合自动创建,techproducts,一个两个shard集合,每个都有两个副本。

The Cloud tab in the Admin UI diagrams the collection nicely:

管理界面中的Cloud选项卡很好地显示了集合:

tutorial solrcloud
Figure 1. SolrCloud Diagram

Index the Techproducts Data

指数Techproducts数据

Your Solr server is up and running, but it doesn’t contain any data yet, so we can’t do any queries.

您的Solr服务器已经启动并运行,但是它还没有包含任何数据,因此我们不能执行任何查询。

Solr includes the bin/post tool in order to facilitate indexing various types of documents easily. We’ll use this tool for the indexing examples below.

Solr包括bin/post工具,以便方便地索引各种类型的文档。我们将使用这个工具来索引下面的索引示例。

You’ll need a command shell to run some of the following examples, rooted in the Solr install directory; the shell from where you launched Solr works just fine.

您需要一个命令shell来运行以下示例,这些示例源于Solr安装目录;你发射Solr的外壳很好用。

Currently the bin/post tool does not have a comparable Windows script, but the underlying Java program invoked is available. We’ll show examples below for Windows, but you can also see the Windows section of the Post Tool documentation for more details.

The data we will index is in the example/exampledocs directory. The documents are in a mix of document formats (JSON, CSV, etc.), and fortunately we can index them all at once:

我们将索引的数据位于示例/exampledocs目录中。这些文档混合了文档格式(JSON、CSV等),幸运的是,我们可以同时将它们进行索引:

Linux/Mac
solr-7.3.0:$ bin/post -c techproducts example/exampledocs/*
Windows
C:\solr-7.3.0> java -jar -Dc=techproducts -Dauto example\exampledocs\post.jar example\exampledocs\*

You should see output similar to the following:

您应该会看到类似如下的输出:

SimplePostTool version 5.0.0
Posting files to [base] url http://localhost:8983/solr/techproducts/update...
Entering auto mode. File endings considered are xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log
POSTing file books.csv (text/csv) to [base]
POSTing file books.json (application/json) to [base]/json/docs
POSTing file gb18030-example.xml (application/xml) to [base]
POSTing file hd.xml (application/xml) to [base]
POSTing file ipod_other.xml (application/xml) to [base]
POSTing file ipod_video.xml (application/xml) to [base]
POSTing file manufacturers.xml (application/xml) to [base]
POSTing file mem.xml (application/xml) to [base]
POSTing file money.xml (application/xml) to [base]
POSTing file monitor.xml (application/xml) to [base]
POSTing file monitor2.xml (application/xml) to [base]
POSTing file more_books.jsonl (application/json) to [base]/json/docs
POSTing file mp500.xml (application/xml) to [base]
POSTing file post.jar (application/octet-stream) to [base]/extract
POSTing file sample.html (text/html) to [base]/extract
POSTing file sd500.xml (application/xml) to [base]
POSTing file solr-word.pdf (application/pdf) to [base]/extract
POSTing file solr.xml (application/xml) to [base]
POSTing file test_utf8.sh (application/octet-stream) to [base]/extract
POSTing file utf8-example.xml (application/xml) to [base]
POSTing file vidcard.xml (application/xml) to [base]
21 files indexed.
COMMITting Solr index changes to http://localhost:8983/solr/techproducts/update...
Time spent: 0:00:00.822

Congratulations again! You have data in your Solr!

再次恭喜你!你的Solr有数据!

Now we’re ready to start searching.

现在我们准备开始搜索了。

Basic Searching

基本的搜索

Solr can be queried via REST clients, curl, wget, Chrome POSTMAN, etc., as well as via native clients available for many programming languages.

Solr可以通过REST客户端、curl、wget、Chrome POSTMAN等进行查询,也可以通过许多编程语言的本地客户机进行查询。

The Solr Admin UI includes a query builder interface via the Query tab for the techproducts collection (at http://localhost:8983/solr/#/techproducts/query). If you click the Execute Query button without changing anything in the form, you’ll get 10 documents in JSON format:

Solr管理用户界面包括一个查询生成器接口,通过查询选项卡(在http://localhost:8983/ Solr /#/techproducts/查询)中查询。如果您单击Execute Query按钮而不更改表单中的任何内容,您将获得10个JSON格式的文档:

Solr Quick Start: techproducts Query screen with results
Figure 2. Query Screen

The URL sent by the Admin UI to Solr is shown in light grey near the top right of the above screenshot. If you click on it, your browser will show you the raw response.

管理员UI发送给Solr的URL在上面截图的右上角附近显示为浅灰色。如果您点击它,您的浏览器将显示原始响应。

To use curl, give the same URL shown in your browser in quotes on the command line:

要使用curl,请在命令行中引用浏览器中的相同URL:

curl "http://localhost:8983/solr/techproducts/select?indent=on&q=*:*"

curl http://localhost:8983 / solr / techproducts /选择?缩进=对于q = *:*”

What’s happening here is that we are using Solr’s query parameter (q) with a special syntax that requests all documents in the index (*:*). All of the documents are not returned to us, however, because of the default for a parameter called rows, which you can see in the form is 10. You can change the parameter in the UI or in the defaults if you wish.

这里发生的情况是,我们使用Solr的查询参数(q),并使用特殊语法请求索引中的所有文档(*:*)。但是,所有的文档都没有返回给我们,因为默认的参数是行,您可以在表单中看到的是10。如果您愿意,您可以更改UI中的参数或默认设置。

Solr has very powerful search options, and this tutorial won’t be able to cover all of them. But we can cover some of the most common types of queries.

Solr有非常强大的搜索选项,而本教程将无法涵盖所有这些选项。但是我们可以讨论一些最常见的查询类型。

Search for a Single Term

搜索一个词。

To search for a term, enter it as the q parameter value in the Solr Admin UI Query screen, replacing *:* with the term you want to find.

要搜索一个术语,请在Solr Admin UI查询屏幕中输入q参数值,替换*:*与您想要查找的术语。

Enter "foundation" and hit Execute Query again.

输入“基础”并再次点击执行查询。

If you prefer curl, enter something like this:

如果你喜欢卷发,可以这样:

curl "http://localhost:8983/solr/techproducts/select?q=foundation"

curl http://localhost:8983 / solr / techproducts /选择? q =基础”

You’ll see something like this:

你会看到这样的东西:

{
  "responseHeader":{
    "zkConnected":true,
    "status":0,
    "QTime":8,
    "params":{
      "q":"foundation"}},
  "response":{"numFound":4,"start":0,"maxScore":2.7879646,"docs":[
      {
        "id":"0553293354",
        "cat":["book"],
        "name":"Foundation",
        "price":7.99,
        "price_c":"7.99,USD",
        "inStock":true,
        "author":"Isaac Asimov",
        "author_s":"Isaac Asimov",
        "series_t":"Foundation Novels",
        "sequence_i":1,
        "genre_s":"scifi",
        "_version_":1574100232473411586,
        "price_c____l_ns":799}]
}}

The response indicates that there are 4 hits ("numFound":4). We’ve only included one document the above sample output, but since 4 hits is lower than the rows parameter default of 10 to be returned, you should see all 4 of them.

响应显示有4个点击(“numFound”:4)。我们只包含了上面示例输出的一个文档,但是由于4个点击量低于10的行参数默认值,您应该可以看到所有4个。

Note the responseHeader before the documents. This header will include the parameters you have set for the search. By default it shows only the parameters you have set for this query, which in this case is only your query term.

请注意文档前面的responseHeader。这个标题将包含您为搜索设置的参数。默认情况下,它只显示您为该查询设置的参数,在本例中仅为查询项。

The documents we got back include all the fields for each document that were indexed. This is, again, default behavior. If you want to restrict the fields in the response, you can use the fl param, which takes a comma-separated list of field names. This is one of the available fields on the query form in the Admin UI.

我们返回的文档包括了被索引的每个文档的所有字段。这是默认行为。如果您想要限制响应中的字段,可以使用fl param,它采用逗号分隔的字段名列表。这是管理UI中查询表单上可用的字段之一。

Put "id" (without quotes) in the "fl" box and hit Execute Query again. Or, to specify it with curl:

将“id”(没有引号)放在“fl”框中,再次点击执行查询。或者,用curl来指定它:

curl "http://localhost:8983/solr/techproducts/select?q=foundation&fl=id"

curl http://localhost:8983 / solr / techproducts /选择? q = foundation&fl = id”

You should only see the IDs of the matching records returned.

您应该只看到返回的匹配记录的id。

Field Searches

领域搜索

All Solr queries look for documents using some field. Often you want to query across multiple fields at the same time, and this is what we’ve done so far with the "foundation" query. This is possible with the use of copy fields, which are set up already with this set of configurations. We’ll cover copy fields a little bit more in Exercise 2.

所有Solr查询都使用某个字段查找文档。通常,您希望同时跨多个字段进行查询,这是我们迄今为止在“基础”查询中所做的工作。这是可能的,使用复制字段,它已经设置了这组配置。我们将在练习2中介绍更多的副本。

Sometimes, though, you want to limit your query to a single field. This can make your queries more efficient and the results more relevant for users.

有时候,您希望将查询限制在单个字段中。这可以使您的查询更高效,结果更适合于用户。

Much of the data in our small sample data set is related to products. Let’s say we want to find all the "electronics" products in the index. In the Query screen, enter "electronics" (without quotes) in the q box and hit Execute Query. You should get 14 results, such as:

我们的小样本数据集中的大部分数据都与产品有关。假设我们想要在指数中找到所有的电子产品。在查询屏幕中,在q框中输入“电子”(没有引号)并点击执行查询。你应该得到14个结果,例如:

{
  "responseHeader":{
    "zkConnected":true,
    "status":0,
    "QTime":6,
    "params":{
      "q":"electronics"}},
  "response":{"numFound":14,"start":0,"maxScore":1.5579545,"docs":[
      {
        "id":"IW-02",
        "name":"iPod & iPod Mini USB 2.0 Cable",
        "manu":"Belkin",
        "manu_id_s":"belkin",
        "cat":["electronics",
          "connector"],
        "features":["car power adapter for iPod, white"],
        "weight":2.0,
        "price":11.5,
        "price_c":"11.50,USD",
        "popularity":1,
        "inStock":false,
        "store":"37.7752,-122.4232",
        "manufacturedate_dt":"2006-02-14T23:55:59Z",
        "_version_":1574100232554151936,
        "price_c____l_ns":1150}]
}}

This search finds all documents that contain the term "electronics" anywhere in the indexed fields. However, we can see from the above there is a cat field (for "category"). If we limit our search for only documents with the category "electronics", the results will be more precise for our users.

该搜索找到所有包含“电子”一词的文档,在索引字段中。然而,我们可以从上面看到有一个猫场(用于“类别”)。如果我们限制只搜索带有“电子”类的文档,结果将更加精确。

Update your query in the q field of the Admin UI so it’s cat:electronics. Now you get 12 results:

在管理UI的q字段中更新查询,所以它是cat:电子。现在你得到了12个结果:

{
  "responseHeader":{
    "zkConnected":true,
    "status":0,
    "QTime":6,
    "params":{
      "q":"cat:electronics"}},
  "response":{"numFound":12,"start":0,"maxScore":0.9614112,"docs":[
      {
        "id":"SP2514N",
        "name":"Samsung SpinPoint P120 SP2514N - hard drive - 250 GB - ATA-133",
        "manu":"Samsung Electronics Co. Ltd.",
        "manu_id_s":"samsung",
        "cat":["electronics",
          "hard drive"],
        "features":["7200RPM, 8MB cache, IDE Ultra ATA-133",
          "NoiseGuard, SilentSeek technology, Fluid Dynamic Bearing (FDB) motor"],
        "price":92.0,
        "price_c":"92.0,USD",
        "popularity":6,
        "inStock":true,
        "manufacturedate_dt":"2006-02-13T15:26:37Z",
        "store":"35.0752,-97.032",
        "_version_":1574100232511160320,
        "price_c____l_ns":9200}]
     }}

Using curl, this query would look like this:

使用curl,该查询将如下所示:

curl "http://localhost:8983/solr/techproducts/select?q=cat:electronics"

curl http://localhost:8983 / solr / techproducts /选择? q =猫:电子”

To search for a multi-term phrase, enclose it in double quotes: q="multiple terms here". For example, search for "CAS latency" by entering that phrase in quotes to the q box in the Admin UI.

要搜索一个多术语短语,请将它括在双引号中:q=“这里的多个术语”。例如,通过在Admin UI中的q框中输入该短语,搜索“CAS延迟”。

If you’re following along with curl, note that the space between terms must be converted to "+" in a URL, as so:

如果您跟随curl,请注意,在一个URL中,术语之间的空间必须转换为“+”,因此:

curl "http://localhost:8983/solr/techproducts/select?q=\"CAS+latency\""

curl http://localhost:8983 / solr / techproducts /选择? q = \ " CAS +延迟\”“

We get 2 results:

我们得到两个结果:

{
  "responseHeader":{
    "zkConnected":true,
    "status":0,
    "QTime":7,
    "params":{
      "q":"\"CAS latency\""}},
  "response":{"numFound":2,"start":0,"maxScore":5.937691,"docs":[
      {
        "id":"VDBDB1A16",
        "name":"A-DATA V-Series 1GB 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200) System Memory - OEM",
        "manu":"A-DATA Technology Inc.",
        "manu_id_s":"corsair",
        "cat":["electronics",
          "memory"],
        "features":["CAS latency 3, 2.7v"],
        "popularity":0,
        "inStock":true,
        "store":"45.18414,-93.88141",
        "manufacturedate_dt":"2006-02-13T15:26:37Z",
        "payloads":"electronics|0.9 memory|0.1",
        "_version_":1574100232590852096},
      {
        "id":"TWINX2048-3200PRO",
        "name":"CORSAIR XMS 2GB (2 x 1GB) 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200) Dual Channel Kit System Memory - Retail",
        "manu":"Corsair Microsystems Inc.",
        "manu_id_s":"corsair",
        "cat":["electronics",
          "memory"],
        "features":["CAS latency 2, 2-3-3-6 timing, 2.75v, unbuffered, heat-spreader"],
        "price":185.0,
        "price_c":"185.00,USD",
        "popularity":5,
        "inStock":true,
        "store":"37.7752,-122.4232",
        "manufacturedate_dt":"2006-02-13T15:26:37Z",
        "payloads":"electronics|6.0 memory|3.0",
        "_version_":1574100232584560640,
        "price_c____l_ns":18500}]
  }}

Combining Searches

结合搜索

By default, when you search for multiple terms and/or phrases in a single query, Solr will only require that one of them is present in order for a document to match. Documents containing more terms will be sorted higher in the results list.

默认情况下,当您在单个查询中搜索多个术语和/或短语时,Solr只要求其中一个存在,以便文档匹配。包含更多术语的文档将在结果列表中得到更高的排序。

You can require that a term or phrase is present by prefixing it with a +; conversely, to disallow the presence of a term or phrase, prefix it with a -.

你可以要求一个术语或短语以+的前缀来表示;相反地,不允许有一个词或短语的出现,在它前面加上一个-。

To find documents that contain both terms "electronics" and "music", enter +electronics +music in the q box in the Admin UI Query tab.

要找到包含“电子”和“音乐”两个术语的文档,请在Admin UI查询选项卡的q框中输入+电子+音乐。

If you’re using curl, you must encode the + character because it has a reserved purpose in URLs (encoding the space character). The encoding for + is %2B as in:

如果使用curl,则必须对+字符进行编码,因为它在url中有一个保留的目的(编码空间字符)。+的编码为%2B:

curl "http://localhost:8983/solr/techproducts/select?q=%2Belectronics%20%2Bmusic"

curl http://localhost:8983 / solr / techproducts /选择? q = % 2 belectronics % 20% 2 bmusic”

You should only get a single result.

你应该只得到一个结果。

To search for documents that contain the term "electronics" but don’t contain the term "music", enter +electronics -music in the q box in the Admin UI. For curl, again, URL encode + as %2B as in:

要搜索包含术语“电子”但不包含“音乐”一词的文档,请在管理界面的q框中输入+电子音乐。对于curl, URL编码+ as %2B:

curl "http://localhost:8983/solr/techproducts/select?q=%2Belectronics+-music"

curl http://localhost:8983 / solr / techproducts /选择? q = % 2 belectronics +音乐”

This time you get 13 results.

这次你得到13个结果。

More Information on Searching

更多信息搜索

We have only scratched the surface of the search options available in Solr. For more Solr search options, see the section on Searching.

我们只触及了Solr中可用的搜索选项的表面。要获得更多的Solr搜索选项,请参阅搜索部分。

Exercise 1 Wrap Up

练习1结束

At this point, you’ve seen how Solr can index data and have done some basic queries. You can choose now to continue to the next example which will introduce more Solr concepts, such as faceting results and managing your schema, or you can strike out on your own.

在这一点上,您已经看到了Solr如何索引数据并完成了一些基本的查询。现在,您可以选择继续下一个示例,它将引入更多的Solr概念,比如faceting结果和管理模式,或者您可以自己动手。

If you decide not to continue with this tutorial, the data we’ve indexed so far is likely of little value to you. You can delete your installation and start over, or you can use the bin/solr script we started out with to delete this collection:

如果您决定不继续使用本教程,那么到目前为止我们索引的数据对您来说可能没什么价值。您可以删除您的安装并重新开始,或者您可以使用我们开始时使用的bin/solr脚本删除这个集合:

bin/solr delete -c techproducts

bin / solr删除- c techproducts

And then create a new collection:

然后创建一个新的集合:

bin/solr create -c <yourCollection> -s 2 -rf 2

bin/solr创建-c -s 2 -rf 2。

To stop both of the Solr nodes we started, issue the command:

为了停止我们开始的Solr节点,发出命令:

bin/solr stop -all

bin / solr停止-

For more information on start/stop and collection options with bin/solr, see Solr Control Script Reference.

有关启动/停止和收集选项与bin/solr的更多信息,请参阅solr控制脚本参考。

Exercise 2: Modify the Schema and Index Films Data

练习2:修改模式和索引电影数据。

This exercise will build on the last one and introduce you to the index schema and Solr’s powerful faceting features.

这个练习将建立在最后一个,并将您介绍到索引模式和Solr强大的faceting特性。

Restart Solr

重新启动Solr

Did you stop Solr after the last exercise? No? Then go ahead to the next section.

你在最后一次锻炼后停止了Solr吗?没有?然后继续下一节。

If you did, though, and need to restart Solr, issue these commands:

但是,如果您确实需要重新启动Solr,请发出以下命令:

./bin/solr start -c -p 8983 -s example/cloud/node1/solr

./bin/solr开始-c -p 8983 -s示例/cloud/node1/solr。

This starts the first node. When it’s done start the second node, and tell it how to connect to to ZooKeeper:

这将启动第一个节点。完成第二个节点后,告诉它如何连接到ZooKeeper:

./bin/solr start -c -p 7574 -s example/cloud/node2/solr -z localhost:9983

/bin/solr启动-c -p 7574 -s example/cloud/node2/solr -z localhost:9983。

Create a New Collection

创建一个新的集合

We’re going to use a whole new data set in this exercise, so it would be better to have a new collection instead of trying to reuse the one we had before.

我们将在这个练习中使用一个全新的数据集,所以最好有一个新的集合,而不是试图重用以前的那个。

One reason for this is we’re going to use a feature in Solr called "field guessing", where Solr attempts to guess what type of data is in a field while it’s indexing it. It also automatically creates new fields in the schema for new fields that appear in incoming documents. This mode is called "Schemaless". We’ll see the benefits and limitations of this approach to help you decide how and where to use it in your real application.

其中一个原因是,我们将在Solr中使用一个名为“字段猜测”的特性,Solr试图猜测在索引它时,字段中的数据类型是什么。它还会自动为出现在传入文档中的新字段创建新字段。这种模式被称为“无模式”。我们将看到这种方法的好处和局限性,以帮助您决定如何在实际应用程序中使用它。

What is a "schema" and why do I need one?

Solr’s schema is a single file (in XML) that stores the details about the fields and field types Solr is expected to understand. The schema defines not only the field or field type names, but also any modifications that should happen to a field before it is indexed. For example, if you want to ensure that a user who enters "abc" and a user who enters "ABC" can both find a document containing the term "ABC", you will want to normalize (lower-case it, in this case) "ABC" when it is indexed, and normalize the user query to be sure of a match. These rules are defined in your schema.

Solr的模式是一个单独的文件(在XML中),它存储有关字段和字段类型Solr的详细信息。模式不仅定义了字段或字段类型名称,还定义了在索引之前应该发生在字段上的任何修改。例如,如果您想要确保输入“abc”的用户和输入“abc”的用户都可以找到包含“abc”一词的文档,那么您将希望使其规范化(在本例中是小写的)“ABC”当它被索引时,并规范化用户查询以确保匹配。这些规则在您的模式中定义。

Earlier in the tutorial we mentioned copy fields, which are fields made up of data that originated from other fields. You can also define dynamic fields, which use wildcards (such as *_t or *_s) to dynamically create fields of a specific field type. These types of rules are also defined in the schema.

在前面的教程中,我们提到了copy字段,它是由来自其他字段的数据组成的字段。您还可以定义动态字段,它使用通配符(比如*_t或*_s)来动态创建特定字段类型的字段。这些类型的规则也在模式中定义。

When you initially started Solr in the first exercise, we had a choice of a configSet to use. The one we chose had a schema that was pre-defined for the data we later indexed. This time, we’re going to use a configSet that has a very minimal schema and let Solr figure out from the data what fields to add.

当您在第一次练习中开始使用Solr时,我们可以选择要使用的configSet。我们选择的那个有一个模式,它是为我们后来索引的数据预先定义的。这一次,我们将使用一个具有非常小的模式的configSet,并让Solr从数据中找出要添加的字段。

The data you’re going to index is related to movies, so start by creating a collection named "films" that uses the _default configSet:

您将要索引的数据与电影有关,所以首先创建一个名为“movies”的集合,它使用_default configSet:

bin/solr create -c films -s 2 -rf 2

bin/solr创建-c薄膜-s 2 -rf 2。

Whoa, wait. We didn’t specify a configSet! That’s fine, the _default is appropriately named, since it’s the default and is used if you don’t specify one at all.

哇,等。我们没有指定configSet!这很好,_default是适当命名的,因为它是默认的,如果您不指定一个,它将被使用。

We did, however, set two parameters -s and -rf. Those are the number of shards to split the collection across (2) and how many replicas to create (2). This is equivalent to the options we had during the interactive example from the first exercise.

然而,我们确实设置了两个参数-s和-rf。这些是通过(2)来分割集合的碎片数量,以及创建了多少个副本(2),这等价于我们在第一次练习的交互式示例中所拥有的选项。

You should see output like:

您应该会看到如下输出:

WARNING: Using _default configset. Data driven schema functionality is enabled by default, which is
         NOT RECOMMENDED for production use.

         To turn it off:
            curl http://localhost:7574/solr/films/config -d '{"set-user-property": {"update.autoCreateFields":"false"}}'

Connecting to ZooKeeper at localhost:9983 ...
INFO  - 2017-07-27 15:07:46.191; org.apache.solr.client.solrj.impl.ZkClientClusterStateProvider; Cluster at localhost:9983 ready
Uploading /7.3.0/server/solr/configsets/_default/conf for config films to ZooKeeper at localhost:9983

Creating new collection 'films' using command:
http://localhost:7574/solr/admin/collections?action=CREATE&name=films&numShards=2&replicationFactor=2&maxShardsPerNode=2&collection.configName=films

{
  "responseHeader":{
    "status":0,
    "QTime":3830},
  "success":{
    "192.168.0.110:8983_solr":{
      "responseHeader":{
        "status":0,
        "QTime":2076},
      "core":"films_shard2_replica_n1"},
    "192.168.0.110:7574_solr":{
      "responseHeader":{
        "status":0,
        "QTime":2494},
      "core":"films_shard1_replica_n2"}}}

The first thing the command printed was a warning about not using this configSet in production. That’s due to some of the limitations we’ll cover shortly.

命令打印的第一件事是警告不要在生产中使用这个configSet。这是由于我们不久将讨论的一些限制。

Otherwise, though, the collection should be created. If we go to the Admin UI at http://localhost:8983/solr/#/films/collection-overview we should see the overview screen.

否则,应该创建集合。如果我们在http://localhost:8983/solr/#/films/collection-overview中访问Admin UI,我们将看到overview屏幕。

Preparing Schemaless for the Films Data

为电影数据准备模式。

There are two parallel things happening with the schema that comes with the _default configSet.

与_default configSet一起出现的模式有两个并行的情况。

First, we are using a "managed schema", which is configured to only be modified by Solr’s Schema API. That means we should not hand-edit it so there isn’t confusion about which edits come from which source. Solr’s Schema API allows us to make changes to fields, field types, and other types of schema rules.

首先,我们使用的是一个“托管模式”,它被配置为只能通过Solr的模式API进行修改。这意味着我们不应该手工编辑它,这样就不会混淆哪些编辑来自哪个源。Solr的模式API允许我们对字段、字段类型和其他类型的模式规则进行更改。

Second, we are using "field guessing", which is configured in the solrconfig.xml file (and includes most of Solr’s various configuration settings). Field guessing is designed to allow us to start using Solr without having to define all the fields we think will be in our documents before trying to index them. This is why we call it "schemaless", because you can start quickly and let Solr create fields for you as it encounters them in documents.

其次,我们使用“字段猜测”,这是在solrconfig中配置的。xml文件(包括大部分Solr的配置设置)。字段猜测的目的是允许我们开始使用Solr,而不必定义所有我们认为将在文档中出现的字段,然后再进行索引。这就是为什么我们称它为“schemaless”,因为您可以快速启动,让Solr在文档中遇到它们时为您创建字段。

Sounds great! Well, not really, there are limitations. It’s a bit brute force, and if it guesses wrong, you can’t change much about a field after data has been indexed without having to reindex. If we only have a few thousand documents that might not be bad, but if you have millions and millions of documents, or, worse, don’t have access to the original data anymore, this can be a real problem.

听起来太棒了!好吧,其实并不是这样,这是有局限性的。这是一种蛮力,如果它猜错了,你就不能在数据被索引而无需重新索引的情况下改变一个字段。如果我们只有几千个文档,这可能并不坏,但是如果您有数百万的文档,或者更糟的是,无法访问原始数据,那么这将是一个真正的问题。

For these reasons, the Solr community does not recommend going to production without a schema that you have defined yourself. By this we mean that the schemaless features are fine to start with, but you should still always make sure your schema matches your expectations for how you want your data indexed and how users are going to query it.

出于这些原因,Solr社区不建议在没有您定义的模式的情况下进行生产。我们的意思是,无模式的特性是可以开始的,但是您应该始终确保您的模式符合您对数据索引的期望,以及用户如何查询它。

It is possible to mix schemaless features with a defined schema. Using the Schema API, you can define a few fields that you know you want to control, and let Solr guess others that are less important or which you are confident (through testing) will be guessed to your satisfaction. That’s what we’re going to do here.

可以将无模式特性与定义的模式混合。使用模式API,您可以定义一些您想要控制的字段,并让Solr猜测其他不那么重要的字段,或者您有信心的(通过测试)将被猜测到您的满意。这就是我们要做的。

Create the "names" Field
创建“名称”字段

The films data we are going to index has a small number of fields for each movie: an ID, director name(s), film name, release date, and genre(s).

我们将要索引的电影数据在每部电影中有少量的字段:ID、导演名称、电影名称、发行日期和类型。

If you look at one of the files in example/films, you’ll see the first film is named .45, released in 2006. As the first document in the dataset, Solr is going to guess the field type based on the data in the record. If we go ahead and index this data, that first film name is going to indicate to Solr that that field type is a "float" numeric field, and will create a "name" field with a type FloatPointField. All data after this record will be expected to be a float.

如果你看一个例子/电影中的一个文件,你会看到第一部电影的名字是。45,2006年上映。作为数据集中的第一个文档,Solr将根据记录中的数据猜测字段类型。如果我们对这个数据进行索引,那么第一个电影名称将指示Solr,该字段类型是一个“float”数值字段,并且将创建一个带有类型FloatPointField的“name”字段。这张唱片之后的所有数据都将是一个浮点数。

Well, that’s not going to work. We have titles like A Mighty Wind and Chicken Run, which are strings - decidedly not numeric and not floats. If we let Solr guess the "name" field is a float, what will happen is later titles will cause an error and indexing will fail. That’s not going to get us very far.

这是行不通的。我们有像强大的风和小鸡运行的标题,它们是字符串——绝对不是数字的,也不是浮动的。如果我们让Solr猜测“name”字段是一个浮点数,将会发生的是后面的标题会导致错误,而索引将会失败。这不会让我们走得太远。

What we can do is set up the "name" field in Solr before we index the data to be sure Solr always interprets it as a string. At the command line, enter this curl command:

我们能做的是在我们索引数据之前,在Solr中设置“name”字段,以确保Solr始终将其解释为字符串。在命令行中,输入curl命令:

curl -X POST -H 'Content-type:application/json' --data-binary '{"add-field": {"name":"name", "type":"text_general", "multiValued":false, "stored":true}}' http://localhost:8983/solr/films/schema

This command uses the Schema API to explicitly define a field named "name" that has the field type "text_general" (a text field). It will not be permitted to have multiple values, but it will be stored (meaning it can be retrieved by queries).

这个命令使用模式API显式地定义一个名为“name”的字段,该字段具有字段类型“text_general”(文本字段)。它不允许有多个值,但是它将被存储(这意味着它可以通过查询检索)。

You can also use the Admin UI to create fields, but it offers a bit less control over the properties of your field. It will work for our case, though:

您还可以使用Admin UI来创建字段,但是它对您的字段的属性提供了较少的控制。它将为我们的案例工作:

Adding a Field
Figure 3. Creating a field
Create a "catchall" Copy Field
创建一个“包罗万象”的复制字段。

There’s one more change to make before we start indexing.

在开始索引之前还有一个更改。

In the first exercise when we queried the documents we had indexed, we didn’t have to specify a field to search because the configuration we used was set up to copy fields into a text field, and that field was the default when no other field was defined in the query.

在第一个锻炼我们查询我们索引的文档时,我们不需要指定一个字段来搜索,因为我们使用的配置是设置字段复制到一个文本字段,该字段是违约时没有其他字段中定义的查询。

The configuration we’re using now doesn’t have that rule. We would need to define a field to search for every query. We can, however, set up a "catchall field" by defining a copy field that will take all data from all fields and index it into a field named _text_. Let’s do that now.

我们现在使用的配置没有这个规则。我们需要定义一个字段来搜索每个查询。但是,我们可以通过定义一个复制字段来设置一个“集所有字段”,该字段将接收来自所有字段的所有数据并将其索引到一个名为_text_的字段中。现在让我们做。

You can use either the Admin UI or the Schema API for this.

您可以使用管理UI或模式API。

At the command line, use the Schema API again to define a copy field:

在命令行中,再次使用模式API来定义一个复制字段:

curl -X POST -H 'Content-type:application/json' --data-binary '{"add-copy-field" : {"source":"*","dest":"_text_"}}' http://localhost:8983/solr/films/schema

In the Admin UI, choose Add Copy Field, then fill out the source and destination for your field, as in this screenshot.

在Admin UI中,选择Add Copy字段,然后填写您的字段的源和目的地,就像在这个屏幕快照中一样。

Adding a copy field
Figure 4. Creating a copy field

What this does is make a copy of all fields and put the data into the "_text_" field.

它所做的是复制所有字段并将数据放入“_text_”字段。

It can be very expensive to do this with your production data because it tells Solr to effectively index everything twice. It will make indexing slower, and make your index larger. With your production data, you will want to be sure you only copy fields that really warrant it for your application.

OK, now we’re ready to index the data and start playing around with it.

好的,现在我们已经准备好索引数据并开始使用它了。

Index Sample Film Data

电影指数样本数据

The films data we will index is located in the example/films directory of your installation. It comes in three formats: JSON, XML and CSV. Pick one of the formats and index it into the "films" collection (in each example, one command is for Unix/MacOS and the other is for Windows):

我们将索引的电影数据位于您安装的示例/电影目录中。它有三种格式:JSON、XML和CSV。选择其中一种格式并将其索引到“电影”集合中(在每个示例中,一个命令用于Unix/MacOS,另一个用于Windows):

To Index JSON Format
bin/post -c films example/films/films.json

C:\solr-7.3.0> java -jar -Dc=films -Dauto example\exampledocs\post.jar example\films\*.json
To Index XML Format
bin/post -c films example/films/films.xml

C:\solr-7.3.0> java -jar -Dc=films -Dauto example\exampledocs\post.jar example\films\*.xml
To Index CSV Format
bin/post -c films example/films/films.csv -params "f.genre.split=true&f.directed_by.split=true&f.genre.separator=|&f.directed_by.separator=|"

C:\solr-7.3.0> java -jar -Dc=films -Dparams=f.genre.split=true&f.directed_by.split=true&f.genre.separator=|&f.directed_by.separator=| -Dauto example\exampledocs\post.jar example\films\*.csv

Each command includes these main parameters:

每个命令包括以下主要参数:

  • -c films: this is the Solr collection to index data to.

    -c电影:这是Solr收集到的索引数据。

  • example/films/films.json (or films.xml or films.csv): this is the path to the data file to index. You could simply supply the directory where this file resides, but since you know the format you want to index, specifying the exact file for that format is more efficient.

    例子/电影/电影。json(或电影。xml或films.csv):这是将数据文件指向索引的路径。您可以简单地提供该文件所在的目录,但是由于您知道要索引的格式,因此指定该格式的确切文件更有效。

Note the CSV command includes extra parameters. This is to ensure multi-valued entries in the "genre" and "directed_by" columns are split by the pipe (|) character, used in this file as a separator. Telling Solr to split these columns this way will ensure proper indexing of the data.

注意,CSV命令包含额外的参数。这是为了确保“类型”和“directed_by”列中的多值条目被管道(|)字符分割,在该文件中用作分隔符。告诉Solr以这种方式拆分这些列将确保正确地索引数据。

Each command will produce output similar to the below seen while indexing JSON:

每个命令将产生类似于下面所见的输出,同时索引JSON:

$ ./bin/post -c films example/films/films.json
/bin/java -classpath /solr-{solr-docs-version}.0/dist/solr-core-{solr-docs-version}.0.jar -Dauto=yes -Dc=films -Ddata=files org.apache.solr.util.SimplePostTool example/films/films.json
SimplePostTool version 5.0.0
Posting files to [base] url http://localhost:8983/solr/films/update...
Entering auto mode. File endings considered are xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log
POSTing file films.json (application/json) to [base]/json/docs
1 files indexed.
COMMITting Solr index changes to http://localhost:8983/solr/films/update...
Time spent: 0:00:00.878

Hooray!

万岁!

If you go to the Query screen in the Admin UI for films (http://localhost:8983/solr/#/films/query) and hit Execute Query you should see 1100 results, with the first 10 returned to the screen.

如果您进入到影片的管理UI中的查询屏幕(http://localhost:8983/solr/#/ movie / Query),然后点击执行查询,您应该会看到1100个结果,前10个返回到屏幕。

Let’s do a query to see if the "catchall" field worked properly. Enter "comedy" in the q box and hit Execute Query again. You should see get 417 results. Feel free to play around with other searches before we move on to faceting.

让我们来做一个查询,看看“所有的”字段是否正常工作。在q框中输入“comedy”,再次点击执行查询。你应该会得到417个结果。在我们继续进行faceting之前,你可以自由地玩其他的搜索。

Faceting

小面

One of Solr’s most popular features is faceting. Faceting allows the search results to be arranged into subsets (or buckets, or categories), providing a count for each subset. There are several types of faceting: field values, numeric and date ranges, pivots (decision tree), and arbitrary query faceting.

Solr最受欢迎的功能之一是faceting。Faceting允许将搜索结果安排为子集(或桶或类别),为每个子集提供计数。有几种类型的faceting:字段值、数字和日期范围、轴心(决策树)和任意查询面板。

Field Facets

领域方面

In addition to providing search results, a Solr query can return the number of documents that contain each unique value in the whole result set.

除了提供搜索结果之外,Solr查询还可以返回在整个结果集中包含每个惟一值的文档数量。

On the Admin UI Query tab, if you check the facet checkbox, you’ll see a few facet-related options appear:

在Admin UI查询选项卡上,如果您检查facet复选框,您将会看到一些与facet相关的选项:

Solr Quick Start: Query tab facet options
Figure 5. Facet options in the Query screen

To see facet counts from all documents (q=*:*): turn on faceting (facet=true), and specify the field to facet on via the facet.field param. If you only want facets, and no document contents, specify rows=0. The curl command below will return facet counts for the genre_str field:

从所有文档中看到facet计数(q=*:*):打开faceting (facet=true),并通过facet将字段指定为facet。场参数。如果只需要facet,而不需要文档内容,则指定行=0。下面的curl命令将返回genre_str字段的facet计数:

curl "http://localhost:8983/solr/films/select?q=*:*&rows=0&facet=true&facet.field=genre_str"

curl http://localhost:8983 / solr /电影/选择? q = *:*行= 0方面= true&facet.field = genre_str”

In your terminal, you’ll see something like:

在你的终端,你会看到一些类似的东西:

{
  "responseHeader":{
    "zkConnected":true,
    "status":0,
    "QTime":11,
    "params":{
      "q":"*:*",
      "facet.field":"genre_str",
      "rows":"0",
      "facet":"true"}},
  "response":{"numFound":1100,"start":0,"maxScore":1.0,"docs":[]
  },
  "facet_counts":{
    "facet_queries":{},
    "facet_fields":{
      "genre_str":[
        "Drama",552,
        "Comedy",389,
        "Romance Film",270,
        "Thriller",259,
        "Action Film",196,
        "Crime Fiction",170,
        "World cinema",167]},
        "facet_ranges":{},
        "facet_intervals":{},
        "facet_heatmaps":{}}}

We’ve truncated the output here a little bit, but in the facet_counts section, you see by default you get a count of the number of documents using each genre for every genre in the index. Solr has a parameter facet.mincount that you could use to limit the facets to only those that contain a certain number of documents (this parameter is not shown in the UI). Or, perhaps you do want all the facets, and you’ll let your application’s front-end control how it’s displayed to users.

我们已经将输出截断了一点,但是在facet_count部分中,默认情况下,你会看到索引中每种类型使用每种类型的文档数量。Solr有一个参数facet。可以使用mincount将各个方面限制为仅包含一定数量文档的部分(此参数未在UI中显示)。或者,您可能需要所有的方面,您将让应用程序的前端控制如何显示给用户。

If you wanted to control the number of items in a bucket, you could do something like this:

如果你想控制一个桶里的物品的数量,你可以这样做:

curl "http://localhost:8983/solr/films/select?=&q=*:*&facet.field=genre_str&facet.mincount=200&facet=on&rows=0"

curl http://localhost:8983 / solr /电影/选择? = q = *:* &facet.field = genre_str&facet.mincount = 200面=行上= 0”

You should only see 4 facets returned.

您应该只看到返回的4个方面。

There are a great deal of other parameters available to help you control how Solr constructs the facets and facet lists. We’ll cover some of them in this exercise, but you can also see the section Faceting for more detail.

还有很多其他的参数可以帮助您控制Solr如何构建facet列表。我们将在这个练习中介绍其中的一些内容,但是您也可以看到该部分的详细内容。

Range Facets

范围方面

For numerics or dates, it’s often desirable to partition the facet counts into ranges rather than discrete values. A prime example of numeric range faceting, using the example techproducts data from our previous exercise, is price. In the /browse UI, it looks like this:

对于数字或日期,通常需要将facet计数划分为范围而不是离散值。数值范围的一个主要例子,使用我们以前的练习的techproducts数据,是价格。在/browse UI中,它是这样的:

Solr Quick Start: Range facets
Figure 6. Range facets

The films data includes the release date for films, and we could use that to create date range facets, which are another common use for range facets.

电影数据包括电影的发布日期,我们可以使用它来创建日期范围,这是范围方面的另一个常见用途。

The Solr Admin UI doesn’t yet support range facet options, so you will need to use curl or similar command line tool for the following examples.

Solr管理UI还没有支持range facet选项,因此您需要使用curl或类似的命令行工具来完成下面的示例。

If we construct a query that looks like this:

如果我们构造一个这样的查询:

curl 'http://localhost:8983/solr/films/select?q=*:*&rows=0'\
    '&facet=true'\
    '&facet.range=initial_release_date'\
    '&facet.range.start=NOW-20YEAR'\
    '&facet.range.end=NOW'\
    '&facet.range.gap=%2B1YEAR'

This will request all films and ask for them to be grouped by year starting with 20 years ago (our earliest release date is in 2000) and ending today. Note that this query again URL encodes a + as %2B.

这将要求所有的电影,并要求它们以20年前开始(我们最早的发行日期是2000年)和今天结束。注意,该查询再次将URL编码为+ as %2B。

In the terminal you will see:

在终端你会看到:

{
  "responseHeader":{
    "zkConnected":true,
    "status":0,
    "QTime":8,
    "params":{
      "facet.range":"initial_release_date",
      "facet.limit":"300",
      "q":"*:*",
      "facet.range.gap":"+1YEAR",
      "rows":"0",
      "facet":"on",
      "facet.range.start":"NOW-20YEAR",
      "facet.range.end":"NOW"}},
  "response":{"numFound":1100,"start":0,"maxScore":1.0,"docs":[]
  },
  "facet_counts":{
    "facet_queries":{},
    "facet_fields":{},
    "facet_ranges":{
      "initial_release_date":{
        "counts":[
          "1997-07-28T17:12:06.919Z",0,
          "1998-07-28T17:12:06.919Z",0,
          "1999-07-28T17:12:06.919Z",48,
          "2000-07-28T17:12:06.919Z",82,
          "2001-07-28T17:12:06.919Z",103,
          "2002-07-28T17:12:06.919Z",131,
          "2003-07-28T17:12:06.919Z",137,
          "2004-07-28T17:12:06.919Z",163,
          "2005-07-28T17:12:06.919Z",189,
          "2006-07-28T17:12:06.919Z",92,
          "2007-07-28T17:12:06.919Z",26,
          "2008-07-28T17:12:06.919Z",7,
          "2009-07-28T17:12:06.919Z",3,
          "2010-07-28T17:12:06.919Z",0,
          "2011-07-28T17:12:06.919Z",0,
          "2012-07-28T17:12:06.919Z",1,
          "2013-07-28T17:12:06.919Z",1,
          "2014-07-28T17:12:06.919Z",1,
          "2015-07-28T17:12:06.919Z",0,
          "2016-07-28T17:12:06.919Z",0],
        "gap":"+1YEAR",
        "start":"1997-07-28T17:12:06.919Z",
        "end":"2017-07-28T17:12:06.919Z"}},
    "facet_intervals":{},
    "facet_heatmaps":{}}}

Pivot Facets

主方面

Another faceting type is pivot facets, also known as "decision trees", allowing two or more fields to be nested for all the various possible combinations. Using the films data, pivot facets can be used to see how many of the films in the "Drama" category (the genre_str field) are directed by a director. Here’s how to get at the raw data for this scenario:

另一个faceting类型是pivot facet,也称为“决策树”,允许为所有可能的组合嵌套两个或多个字段。使用电影数据,可以使用pivot facet来查看“戏剧”类(genre_str字段)中有多少部电影是由导演执导的。下面是如何获取此场景的原始数据:

curl "http://localhost:8983/solr/films/select?q=*:*&rows=0&facet=on&facet.pivot=genre_str,directed_by_str"

curl http://localhost:8983 / solr /电影/选择? q = *:*行= 0方面= on&facet.pivot = genre_str directed_by_str”

This results in the following response, which shows a facet for each category and director combination:

这就产生了以下的反应:每个类别和导演组合都有一个方面:

{"responseHeader":{
    "zkConnected":true,
    "status":0,
    "QTime":1147,
    "params":{
      "q":"*:*",
      "facet.pivot":"genre_str,directed_by_str",
      "rows":"0",
      "facet":"on"}},
  "response":{"numFound":1100,"start":0,"maxScore":1.0,"docs":[]
  },
  "facet_counts":{
    "facet_queries":{},
    "facet_fields":{},
    "facet_ranges":{},
    "facet_intervals":{},
    "facet_heatmaps":{},
    "facet_pivot":{
      "genre_str,directed_by_str":[{
          "field":"genre_str",
          "value":"Drama",
          "count":552,
          "pivot":[{
              "field":"directed_by_str",
              "value":"Ridley Scott",
              "count":5},
            {
              "field":"directed_by_str",
              "value":"Steven Soderbergh",
              "count":5},
            {
              "field":"directed_by_str",
              "value":"Michael Winterbottom",
              "count":4}}]}]}}}

We’ve truncated this output as well - you will see a lot of genres and directors in your screen.

我们也截断了这个输出——你会在屏幕上看到很多类型和导演。

Exercise 2 Wrap Up

练习2结束

In this exercise, we learned a little bit more about how Solr organizes data in the indexes, and how to work with the Schema API to manipulate the schema file. We also learned a bit about facets in Solr, including range facets and pivot facets. In both of these things, we’ve only scratched the surface of the available options. If you can dream it, it might be possible!

在这个练习中,我们进一步了解了Solr如何在索引中组织数据,以及如何使用模式API来操作模式文件。我们还了解了Solr的一些方面,包括范围方面和主方面。在这两种情况下,我们只触及了可用选项的表面。如果你能梦到它,它可能是可能的!

Like our previous exercise, this data may not be relevant to your needs. We can clean up our work by deleting the collection. To do that, issue this command at the command line:

和我们之前的练习一样,这些数据可能与您的需求无关。我们可以通过删除集合来清理工作。为此,在命令行中发出该命令:

bin/solr delete -c films

bin / solr删除- c的电影

Exercise 3: Index Your Own Data

练习3:索引你自己的数据。

For this last exercise, work with a dataset of your choice. This can be files on your local hard drive, a set of data you have worked with before, or maybe a sample of the data you intend to index to Solr for your production application.

对于最后的练习,请使用您选择的数据集。这可以是您本地硬盘上的文件,您以前使用过的一组数据,或者您想要为您的生产应用程序索引到Solr的数据的示例。

This exercise is intended to get you thinking about what you will need to do for your application:

这个练习的目的是让你思考你需要为你的应用做些什么:

  • What sorts of data do you need to index?

    你需要对哪些数据进行索引?

  • What will you need to do to prepare Solr for your data (such as, create specific fields, set up copy fields, determine analysis rules, etc.)

    您需要做什么来为您的数据准备Solr(例如,创建特定的字段、设置复制字段、确定分析规则等等)。

  • What kinds of search options do you want to provide to users?

    您想为用户提供什么样的搜索选项?

  • How much testing will you need to do to ensure everything works the way you expect?

    你需要做多少测试来确保每件事情按照你的预期运行?

Create Your Own Collection

创建自己的收藏

Before you get started, create a new collection, named whatever you’d like. In this example, the collection will be named "localDocs"; replace that name with whatever name you choose if you want to.

在您开始之前,创建一个新的集合,命名您想要的任何东西。在本例中,集合将被命名为“localDocs”;如果您愿意,可以用您选择的名称替换该名称。

./bin/solr create -c localDocs -s 2 -rf 2

./bin/solr创建-c localDocs -s 2 -rf 2。

Again, as we saw from Exercise 2 above, this will use the _default configSet and all the schemaless features it provides. As we noted previously, this may cause problems when we index our data. You may need to iterate on indexing a few times before you get the schema right.

同样,正如我们在上面的练习2中看到的,这将使用_default configSet和它提供的所有无模式特性。如前所述,当我们对数据进行索引时,这可能会导致问题。在获得模式正确之前,您可能需要对索引进行多次迭代。

Indexing Ideas

索引的想法

Solr has lots of ways to index data. Choose one of the approaches below and try it out with your system:

Solr有很多方法来索引数据。选择下面的一种方法,并尝试用你的系统:

Local Files with bin/post

If you have a local directory of files, the Post Tool (bin/post) can index a directory of files. We saw this in action in our first exercise.

如果您有一个本地文件目录,则Post工具(bin/ Post)可以索引文件目录。我们在第一次练习中看到了这个。

We used only JSON, XML and CSV in our exercises, but the Post Tool can also handle HTML, PDF, Microsoft Office formats (such as MS Word), plain text, and more.

我们在练习中只使用了JSON、XML和CSV,但Post工具还可以处理HTML、PDF、Microsoft Office格式(如MS Word)、纯文本等。

In this example, assume there is a directory named "Documents" locally. To index it, we would issue a command like this (correcting the collection name after the -c parameter as needed):

在本例中,假设在本地有一个名为“文档”的目录。为了索引它,我们将发出这样的命令(在需要的-c参数之后纠正集合名称):

./bin/post -c localDocs ~/Documents

. / bin / post - c localDocs ~ /文档

You may get errors as it works through your documents. These might be caused by the field guessing, or the file type may not be supported. Indexing content such as this demonstrates the need to plan Solr for your data, which requires understanding it and perhaps also some trial and error.

在文档中工作时,可能会出现错误。这些可能是由字段猜测引起的,或者不支持文件类型。诸如此类的索引内容说明需要为您的数据计划Solr,这需要理解它,可能还需要一些尝试和错误。

DataImportHandler

Solr includes a tool called the Data Import Handler (DIH) which can connect to databases (if you have a jdbc driver), mail servers, or other structured data sources. There are several examples included for feeds, GMail, and a small HSQL database.

Solr包括一个称为数据导入处理程序(DIH)的工具,它可以连接到数据库(如果您有一个jdbc驱动程序)、邮件服务器或其他结构化数据源。有几个示例包括提要、GMail和一个小型HSQL数据库。

The README.txt file in example/example-DIH will give you details on how to start working with this tool.

README。例如,txt文件将提供如何开始使用此工具的详细信息。

SolrJ

SolrJ is a Java-based client for interacting with Solr. Use SolrJ for JVM-based languages or other Solr clients to programmatically create documents to send to Solr.

SolrJ是一个基于java的客户端,用于与Solr交互。使用SolrJ为基于jvm的语言或其他Solr客户端编程创建文档发送到Solr。

Documents Screen

Use the Admin UI Documents tab (at http://localhost:8983/solr/#/localDocs/documents) to paste in a document to be indexed, or select Document Builder from the Document Type dropdown to build a document one field at a time. Click on the Submit Document button below the form to index your document.

使用Admin UI文档选项卡(在http://localhost:8983/solr/#/localDocs/文档中)将文档粘贴到要索引的文档中,或者从文档类型的下拉菜单中选择文档构建器,以便一次构建一个文档字段。点击表单下面的提交文档按钮以索引您的文档。

Updating Data

更新数据

You may notice that even if you index content in this tutorial more than once, it does not duplicate the results found. This is because the example Solr schema (a file named either managed-schema or schema.xml) specifies a uniqueKey field called id. Whenever you POST commands to Solr to add a document with the same value for the uniqueKey as an existing document, it automatically replaces it for you.

您可能会注意到,即使在本教程中索引内容不止一次,它也不会复制所发现的结果。这是因为示例Solr模式(一个名为manageschema或schema.xml的文件)指定了一个名为id的uniqueKey字段,每当您将命令发布到Solr时,它就会自动替换为您的文档,该文档的值与现有文档的值相同。

You can see that that has happened by looking at the values for numDocs and maxDoc in the core-specific Overview section of the Solr Admin UI.

您可以看到,在Solr管理UI的核心特定的概述部分中,查看numDocs和maxDoc的值就可以看到这一点。

numDocs represents the number of searchable documents in the index (and will be larger than the number of XML, JSON, or CSV files since some files contained more than one document). The maxDoc value may be larger as the maxDoc count includes logically deleted documents that have not yet been physically removed from the index. You can re-post the sample files over and over again as much as you want and numDocs will never increase, because the new documents will constantly be replacing the old.

numDocs表示索引中可搜索文档的数量(由于某些文件包含多个文档,因此它的数量将大于XML、JSON或CSV文件的数量)。maxDoc值可能会更大,因为maxDoc计数包括逻辑删除的文档,这些文档还没有从索引中删除。您可以一次又一次地重新发布示例文件,而numDocs将永远不会增加,因为新文档将不断地替换旧文档。

Go ahead and edit any of the existing example data files, change some of the data, and re-run the PostTool (bin/post). You’ll see your changes reflected in subsequent searches.

继续编辑任何现有的示例数据文件,更改一些数据,并重新运行后工具(bin/post)。您将看到在后续的搜索中所反映的更改。

Deleting Data

删除数据

If you need to iterate a few times to get your schema right, you may want to delete documents to clear out the collection and try again. Note, however, that merely removing documents doesn’t change the underlying field definitions. Essentially, this will allow you to reindex your data after making changes to fields for your needs.

如果您需要迭代几次来获得您的模式正确,您可能需要删除文档以清除集合并再次尝试。但是,请注意,仅仅删除文档并不会改变基础字段定义。从本质上说,这将允许您在为您的需要做出更改之后重新索引您的数据。

You can delete data by POSTing a delete command to the update URL and specifying the value of the document’s unique key field, or a query that matches multiple documents (be careful with that one!). We can use bin/post to delete documents also if we structure the request properly.

您可以通过将delete命令发布到更新URL并指定文档的惟一键字段的值,或者匹配多个文档的查询(请小心使用那个!)来删除数据。我们也可以使用bin/post来删除文档,如果我们正确地构造了请求。

Execute the following command to delete a specific document:

执行以下命令删除特定文档:

bin/post -c localDocs -d "<delete><id>SP2514N</id></delete>"

bin/post -c localDocs -d " "

To delete all documents, you can use "delete-by-query" command like:

要删除所有文档,您可以使用“delete-by-query”命令:

bin/post -c localDocs -d "<delete><query>*:*</query></delete>"

bin/post -c localDocs -d " <查询> *:* "

You can also modify the above to only delete documents that match a specific query.

您还可以修改上面的内容,只删除匹配特定查询的文档。

Exercise 3 Wrap Up

练习3结束

At this point, you’re ready to start working on your own.

此时,您已经准备好开始自己的工作了。

Jump ahead to the overall wrap up when you’re ready to stop Solr and remove all the examples you worked with and start fresh.

当您准备停止Solr并删除所有与您一起工作的示例并重新启动时,请跳到整个包上。

Spatial Queries

空间查询

Solr has sophisticated geospatial support, including searching within a specified distance range of a given location (or within a bounding box), sorting by distance, or even boosting results by the distance.

Solr具有复杂的地理空间支持,包括在给定位置(或在一个边界框内)的指定距离范围内搜索,通过距离进行排序,甚至通过距离来提高结果。

Some of the example techproducts documents we indexed in Exercise 1 have locations associated with them to illustrate the spatial capabilities. To re-index this data, see Exercise 1.

我们在练习1中索引的一些示例techproducts文档中有与它们相关联的位置来说明空间功能。要重新索引此数据,请参见练习1。

Spatial queries can be combined with any other types of queries, such as in this example of querying for "ipod" within 10 kilometers from San Francisco:

空间查询可以与任何其他类型的查询相结合,例如在距离旧金山10公里范围内查询“ipod”的例子:

Solr Quick Start: spatial search
Figure 7. Spatial queries and results

This is from Solr’s example search UI (called /browse), which has a nice feature to show a map for each item and allow easy selection of the location to search near. You can see this yourself by going to http://localhost:8983/solr/techproducts/browse?q=ipod&pt=37.7752%2C-122.4232&d=10&sfield=store&fq=%7B%21bbox%7D&queryOpts=spatial&queryOpts=spatial in a browser.

这是Solr的示例搜索UI(称为/browse),它有一个很好的特性,可以显示每个条目的映射,并允许在附近搜索位置的简单选择。您可以通过访问http://localhost:8983/solr/techproducts/browse?q=ipod&pt=37.7752% 2c - 122.42&d =10&sfield=store&fq=% 7b_% 21bbox%7D&queryOpts=空间在浏览器中。

To learn more about Solr’s spatial capabilities, see the section Spatial Search.

要了解更多关于Solr的空间能力,请参阅部分空间搜索。

Wrapping Up

结束

If you’ve run the full set of commands in this quick start guide you have done the following:

如果您已经在这个快速启动指南中运行了完整的命令集,那么您已经完成了以下步骤:

  • Launched Solr into SolrCloud mode, two nodes, two collections including shards and replicas

    将Solr启动为SolrCloud模式,两个节点,两个集合,包括shards和replicas。

  • Indexed several types of files

    索引了几种类型的文件。

  • Used the Schema API to modify your schema

    使用模式API来修改您的模式。

  • Opened the admin console, used its query interface to get results

    打开管理控制台,使用它的查询接口获得结果。

  • Opened the /browse interface to explore Solr’s features in a more friendly and familiar interface

    打开/浏览界面,以更友好和熟悉的界面探索Solr的特性。

Nice work!

不错的工作!

Cleanup

清理

As you work through this tutorial, you may want to stop Solr and reset the environment back to the starting point. The following command line will stop Solr and remove the directories for each of the two nodes that were created all the way back in Exercise 1:

在学习本教程的过程中,您可能希望停止Solr并将环境重新设置为起点。下面的命令行将停止Solr并删除在练习1中创建的所有两个节点的目录:

bin/solr stop -all ; rm -Rf example/cloud/

bin / solr停止-;rm /云/射频例子

Where to next?

接下来去哪里?

This Guide will be your best resource for learning more about Solr.

本指南将是您学习更多关于Solr的最佳资源。

Solr also has a robust community made up of people happy to help you get started. For more information, check out the Solr website’s Resources page.

Solr也有一个强大的社区,人们乐于帮助你起步。有关更多信息,请查阅Solr网站的资源页面。

Comments on this Page

We welcome feedback on Solr documentation. However, we cannot provide application support via comments. If you need help, please send a message to the Solr User mailing list.

我们欢迎对Solr文档的反馈。但是,我们不能通过评论提供应用程序支持。如果您需要帮助,请向Solr用户邮件列表发送一条消息。