Fork me on GitHub

elasticsearch搭建搜索引擎

安装

这里先使用中文版的(已经帮我们装好了一些插件),等熟悉后可以自行diy
https://github.com/medcl/elasticsearch-rtf
找个文件夹,

1
git clone git://github.com/medcl/elasticsearch-rtf.git -b master --depth 1

文件有点大,耐心等待. 也可以直接下载zip压缩包(迅雷的速度还可以)

下载完成后,打开文件夹,找到bin目录下的elasticsearch.bat文件,双击运行

浏览器访问

head插件和kibana的安装

elasticsearch-head

https://github.com/mobz/elasticsearch-head

1
2
3
4
5
6
7
8
9
git clone git://github.com/mobz/elasticsearch-head.git

cd elasticsearch-head

cnpm install

cnpm run start

open http://localhost:9100/

此时打开浏览器访问,会发现集群未连接,控制台里面会有一堆错误信息

打开elasticsearch-rtf-master\config\elasticsearch.yml文件,添加如下内容

1
2
3
4
http.cors.enabled: true
http.cors.allow-origin: "*"
http.cors.allow-methods: OPTIONS, HEAD, GET, POST, PUT, DELETE
http.cors.allow-headers: Authorization, X-Requested-With, Content-Length, Content-Type

重新运行elasticsearch.bat文件

kibana

https://www.elastic.co/downloads/past-releases
在这里找到自己elasticsearch版本号相同的kibana版本并下载
elasticsearch版本可以通过http://127.0.0.1:9200/查看number: "5.1.1",

下载完成后解压,打开bin目录,运行里面的kibana.bat文件

打开http://127.0.0.1:5601即可访问

elasticsearch基础操作

打开http://127.0.0.1:5601,点击Dev Tools选项进行操作

1
2
3
4
5
6
7
8
9
10
11
12
# 添加索引 
# number_of_shards:分片,创建后不可更改
# number_of_replicas:副本,创建后可以更改
PUT lagou
{
"settings": {
"index": {
"number_of_shards":5,
"number_of_replicas":1
}
}
}
1
2
3
4
# 获取索引内容
GET lagou
# 获取具体属性,只需要在该属性前添加下划线即可
GET lagou/_settings
1
2
3
4
5
6
7
8
9
10
11
12
13
# 修改settings
# 修改副本数量为2
PUT lagou/_settings
{
"number_of_replicas":2
}

# 修改分片数量为2
# 前面说过分片创建后不可修改,这里运行后右边会有错误信息打印出来
PUT lagou/_settings
{
"number_of_shards":2
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
# 保存文档
# 指明ID
PUT lagou/job/1
{
"title":"python分布式爬虫开发",
"salary_min":15000,
"city":"北京",
"company":{
"name":"百度",
"company_addr":"北京软件园"
},
"publish_date":"2018-10-12",
"comments":15
}

# 不指明ID
POST lagou/job/
{
"title":"python分布式爬虫开发2",
"salary_min":15000,
"city":"北京",
"company":{
"name":"百度",
"company_addr":"北京软件园"
},
"publish_date":"2018-10-12",
"comments":15
}

数据插入成功后可以在`http://127.0.0.1:9100/`进行查看,点击数据浏览即可
1
2
3
4
5
6
7
8
9
10
# 获取信息

GET lagou/job/1
GET lagou/job/1?_source
# 获取具体文档的具体字段
GET lagou/job/1?_source=title
GET lagou/job/1?_source=title,city,company

# 获取所有文档
GET lagou/job/_search
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# 修改字段
PUT lagou/job/1
{
"title":"python分布式爬虫开发",
"salary_min":15000,
"city":"北京"
}

POST lagou/job/1/_update
{
"doc": {
"title":"python开发"
}
}

PUT会对这个文档进行覆盖,先删除再创建
POST会更新指定的字段,其他的字段不受影响
1
2
3
4
5
# 删除
# 删除某一个
DELETE lagou/job/1
# 删除文档库
DELETE lagou/

elasticsearch批量操作

首先添加一些数据

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
PUT testdb
{
"settings": {
"index": {
"number_of_shards":5,
"number_of_replicas":1
}
}
}
PUT testdb/job1/1
{
"title":"job1_1"
}
PUT testdb/job1/2
{
"title":"job1_2"
}
PUT testdb/job2/1
{
"title":"job2_1"
}
PUT testdb/job2/2
{
"title":"job2_2"
}

批量获取

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
GET _mget
{
"docs":[
{
"_index":"testdb",
"_type":"job1",
"_id":1
},
{
"_index":"testdb",
"_type":"job2",
"_id":2
}
]
}

GET testdb/_mget
{
"docs":[
{
"_type":"job1",
"_id":1
},
{
"_type":"job2",
"_id":2
}
]
}

GET testdb/job1/_mget
{
"docs":[
{
"_id":1
},
{
"_id":2
}
]
}

GET testdb/job1/_mget
{
"ids":[1, 2]
}

bulk批量操作

1
2
{"action":{"metadata"}}
{"data"}

action类型

  • delete 删除一个文档操作,只要一个json串即可
  • update 执行partial update操作
  • create put /index/type/id/_create,强制创建操作
  • index 普通的put操作,可以是创建文档,也可以是全量替换
    create 和index的区别: 如果数据存在,使用create操作失败,会提示文档已经存在,使用index则可以成功执行.
1
2
3
4
5
6
7
8
9
10
11
# 删除
{"delete":{"_index":"test_index","_type":"test_type","_id":10}}

{"create":{"_index":"test_index","_type":"test_type","_id":12}}
{"test_field": "test field 12"}

{"index":{"_index":"test_index","_type":"test_type","_id":2}}
{"test_field2": "reindex test field2"}

{"update": {"_index":"test_index","_type":"test_type","_id":1}}
{"doc":{"test_field1": "partial update test field2"}}

简单使用

1
2
3
4
5
PUT _bulk
{"index":{"_index":"lagou","_type":"job","_id":1}}
{"title":"python分布式爬虫开发1","salary_min":15000,"city":"北京","company":{"name":"百度","company_addr":"北京软件园"},"publish_date":"2018-10-12","comments":15}
{"index":{"_index":"lagou","_type":"job","_id":2}}
{"title":"python分布式爬虫开发2","salary_min":15000,"city":"北京","company":{"name":"百度","company_addr":"北京软件园"},"publish_date":"2018-10-12","comments":15}

elasticsearch映射操作

对字段进行限制
https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping.html

elasticsearch查询搜索

首先创建一个mapping

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
PUT lagou
{
"mappings": {
"job":{
"properties": {
"title":{
"store": true,
"type": "text",
"analyzer": "ik_max_word"
},
"company_name":{
"store": true,
"type": "keyword"
},
"desc":{
"type": "text"
},
"comments":{
"type": "integer"
},
"add_time":{
"type": "date",
"format": "yyyy-MM-dd"
}
}
}
}
}

添加一些数据

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
POST lagou/job/
{
"title":"python django 开发工程师",
"company_name":"美团科技有限公司",
"desc":"熟悉python基础知识,熟悉django",
"comments":20,
"add_time":"2018-9-13"
}

POST lagou/job/
{
"title":"python scrapy redis 分布式爬虫",
"company_name":"百度科技有限公司",
"desc":"熟悉scrapy,熟悉redis",
"comments":8,
"add_time":"2018-10-13"
}

POST lagou/job/
{
"title":"elasticsearch打造搜索引擎",
"company_name":"阿里巴巴科技有限公司",
"desc":"熟悉数据结构算法,熟悉python的基本开发",
"comments":16,
"add_time":"2018-6-20"
}

POST lagou/job/
{
"title":"python 打造搜索引擎系统",
"company_name":"阿里巴巴科技有限公司",
"desc":"熟悉python基础知识,熟悉c语言",
"comments":20,
"add_time":"2018-10-13"
}

match查询

1
2
3
4
5
6
7
8
9
# match查询 
GET lagou/job/_search
{
"query": {
"match": {
"title": "python网站"
}
}
}

分词器会对python网站进行分解,将title中含有分解结果的数据进行返回

查看分词器分析的结果

1
2
3
4
5
GET _analyze
{
"analyzer": "ik_max_word",
"text": ["python web 开发"]
}

term查询

1
2
3
4
5
6
7
8
9
# term查询
GET lagou/job/_search
{
"query": {
"term": {
"title": "python开发"
}
}
}

分词器不会对python开发进行分解,如果查询其他字段,需要参考mapping

terms查询

1
2
3
4
5
6
7
8
9
# terms查询
GET lagou/job/_search
{
"query": {
"terms": {
"title": ["python","开发","搜索"]
}
}
}

数组中的任意条件满足条件都会返回

控制返回数量

1
2
3
4
5
6
7
8
9
10
GET lagou/_search
{
"query":{
"match":{
"title":"python"
}
},
"from":0,
"size":3
}

query: 查询的结果集
from: 从第几条数据开始显示
size: 显示多少条

match_all查询

1
2
3
4
5
6
GET lagou/_search
{
"query":{
"match_all":{}
}
}

返回所有数据

match_phrase查询

1
2
3
4
5
6
7
8
9
10
11
GET lagou/_search
{
"query":{
"match_phrase":{
"title":{
"query":"python系统",
"slop": 7
}
}
}
}

python系统进行分解–>python,系统
slop: 分解后的数据在数据库中的数据之间的最大距离
python 打造搜索引擎系统中两个分词之间的距离是7,如果写成6的话就会找不到数据

multi_match查询

1
2
3
4
5
6
7
8
9
GET lagou/_search
{
"query":{
"multi_match":{
"query":"python",
"fields":["title^3","desc"]
}
}
}

可以指定在哪几个字段中搜索
title^3的含义是提高title字段的权重,优先显示

指定返回字段

1
2
3
4
5
6
7
8
9
GET lagou/_search
{
"stored_fields":["title", "company_name"],
"query":{
"match":{
"title":"python"
}
}
}

使用stored_fields来控制返回的字段
mapping中titlecompany_name两个字段设置了"store": true
其他字段即使写在了stored_fields里面也不会返回

结果排序

1
2
3
4
5
6
7
8
9
GET lagou/_search
{
"query":{
"match_all":{}
},
"sort":[{
"comments":"desc"
}]
}

查询范围

1
2
3
4
5
6
7
8
9
10
11
12
GET lagou/_search
{
"query":{
"range":{
"comments":{
"gte":10,
"lte":20,
"boost":2.0
}
}
}
}
1
2
3
4
5
6
7
8
9
10
11
GET lagou/_search
{
"query":{
"range":{
"add_time":{
"gte":"2018-10-1",
"lte":"now"
}
}
}
}

gte: >=
lte: <=

wildcard查询

1
2
3
4
5
6
7
8
9
10
11
GET lagou/_search
{
"query":{
"wildcard":{
"title";{
"value":"pyth*n",
"boost":2.0
}
}
}
}

组合查询

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# bool组合查询
# filter:过滤,不参与打分
# must:如果有多个条件,这些条件都必须满足 and与
# should:如果有多个条件,满足一个或多个即可 or或
# must_not:和must相反,必须都不满足条件才可以匹配到 !非
布尔查询
与匹配其他查询的布尔组合的文档相匹配的查询.bool查询映射到Lucene BooleanQuery.它是使用一个或多个布尔子句构建的,每个子句都有一个类型化的事件.发生的类型是:


发生 描述
must
该条款(查询)必须出现在匹配的文件,并将有助于得分.
filter
子句(查询)必须出现在匹配的文档中.然而不像 must查询的分数将被忽略.Filter子句在过滤器上下文中执行,这意味着评分被忽略,子句被考虑用于高速缓存.
should
子句(查询)应该出现在匹配的文档中.如果 bool查询位于查询上下文中并且具有mustor filter子句,则bool即使没有should查询匹配,文档也将匹配该查询 .在这种情况下,这些条款仅用于影响分数.如果bool查询是过滤器上下文 或者两者都不存在,must或者filter至少有一个should查询必须与文档相匹配才能与bool查询匹配.这种行为可以通过设置minimum_should_match参数来显式控制.
must_not
子句(查询)不能出现在匹配的文档中.子句在过滤器上下文中执行,意味着评分被忽略,子句被考虑用于高速缓存.因为计分被忽略,0所有文件的分数被返回.

建立测试数据

1
2
3
4
5
6
7
8
9
POST lagou/testjob/_bulk
{"index":{"_id":1}}
{"salary":10,"title":"python"}
{"index":{"_id":2}}
{"salary":20,"title":"scrapy"}
{"index":{"_id":3}}
{"salary":30,"title":"django"}
{"index":{"_id":4}}
{"salary":30,"title":"elasticsearch"}

sql: select * from testjob where salary=20

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
GET lagou/testjob/_search
{
"query":{
"bool":{
"must":{
"match_all":{}
},
"filter":{
"term":{
"salary":20
}
}
}
}
}

sql: select * from testjob where (salary=20 or title=python) and (salary!=30)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
GET lagou/testjob/_search
{
"query":{
"bool":{
"should":[
{"term":{
"salary":20
}},
{"term":{
"title":"python"
}}
],
"must_not":[
{"term":{
"salary":30
}}
]
}
}
}

sql: select * from testjob where title="python" or (title="elasticsearch" and salary=30)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
GET lagou/testjob/_search
{
"query":{
"bool":{
"should":[
{"term":{"title":"python"}},
{"bool":{
"must":[
{"term":{"title":"elasticsearch"}},
{"term":{"salary":30}}
]
}
}]
}
}
}

过滤空和非空

建立测试数据

1
2
3
4
5
6
7
8
9
10
11
POST lagou/testjob2/_bulk
{"index":{"_id":1}}
{"tags":["search"]}
{"index":{"_id":2}}
{"tags":["search", "python"]}
{"index":{"_id":3}}
{"other_field":["some data"]}
{"index":{"_id":4}}
{"tags":null}
{"index":{"_id":5}}
{"tags":["search", null]}

sql: select tags from testjob2 where tags is not NULL

1
2
3
4
5
6
7
8
9
10
11
12
GET lagou/testjob2/_search
{
"query":{
"bool":{
"filter":{
"exists":{
"field":"tags"
}
}
}
}
}

sql: select tags from testjob2 where tags is NULL

1
2
3
4
5
6
7
8
9
10
11
12
GET lagou/testjob2/_search
{
"query":{
"bool":{
"must_not":{
"exists":{
"field":"tags"
}
}
}
}
}

将数据写入到es中

https://github.com/elastic/elasticsearch-dsl-py
注意版本

1
pip install elasticsearch-dsl==5.2.0

ArticleSpider下新建models包,和spider同级,
新建ArticleSpider/models/es_types.py

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
from datetime import datetime
from elasticsearch_dsl import Date, Integer, Keyword, Text, DocType, Completion
from elasticsearch_dsl.connections import connections
from elasticsearch_dsl.analysis import CustomAnalyzer as _CustomAnalyzer

connections.create_connection(hosts=["localhost"])


class CustomAnalyzer(_CustomAnalyzer):
def get_analysis_definition(self):
return {}


ik_analyzer = CustomAnalyzer("ik_max_word", filter=["lowercase"])


class ArticleType(DocType):
# 伯乐在线文章类型

title = Text(analyzer="ik_max_word")
create_date = Date()
url = Keyword()
url_object_id = Keyword()
front_image_url = Keyword()
front_image_path = Keyword()
praise_nums = Integer()
comment_nums = Integer()
fav_nums = Integer()
tags = Text(analyzer="ik_max_word")
content = Text(analyzer="ik_max_word")

class Meta:
index = "jobbole"
doc_type = "article"


if __name__ == "__main__":
ArticleType.init()

ArticleSpider/items.py

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
class JobBoleArticleItem(scrapy.Item):
...

def save_to_es(self):
article = ArticleType()
article.title = self['title']
article.create_date = self["create_date"]
article.content = remove_tags(self["content"])
article.front_image_url = self["front_image_url"]
if "front_image_path" in self:
article.front_image_path = self["front_image_path"]
article.praise_nums = self["praise_nums"]
article.fav_nums = self["fav_nums"]
article.comment_nums = self["comment_nums"]
article.url = self["url"]
article.tags = self["tags"]
article.meta.id = self["url_object_id"]

article.save()

ArticleSpider/pipelines.py

1
2
3
4
5
6
7
8
class ElasticsearchPipeline(object):
# 将数据写入到es中

def process_item(self, item, spider):
#将item转换为es的数据
item.save_to_es()

return item

另外两个类似

搜索建议

https://www.elastic.co/guide/en/elasticsearch/reference/5.1/search-suggesters.html

models中添加suggest

1
2
3
4
class ArticleType(DocType):
# 伯乐在线文章类型
suggest = Completion(analyzer=ik_analyzer)
...

item中添加suggest

1
2
def save_to_es(self):
article.suggest = gen_suggests(ArticleType._doc_type.index, ((article.title,10),(article.tags, 7)))

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
from elasticsearch_dsl.connections import connections

es = connections.create_connection(ArticleType._doc_type.using)

def gen_suggests(index, info_tuple):
#根据字符串生成搜索建议数组
used_words = set()
suggests = []
for text, weight in info_tuple:
if text:
#调用es的analyze接口分析字符串
words = es.indices.analyze(index=index, analyzer="ik_max_word", params={'filter':["lowercase"]}, body=text)
anylyzed_words = set([r["token"] for r in words["tokens"] if len(r["token"])>1])
new_words = anylyzed_words - used_words
else:
new_words = set()

if new_words:
suggests.append({"input":list(new_words), "weight":weight})

return suggests

搭建django并实现智能提示

新建一个django项目,将静态文件及模板拷贝到项目中,配置静态文件访问路径
配置主页url

1
2
3
4
5
6
7
8
9
10
11
12
13
14
from django.contrib import admin
from django.urls import path
from django.views.generic import TemplateView

from search.views import SearchSuggest, SearchView

urlpatterns = [
path('admin/', admin.site.urls),
path('', TemplateView.as_view(template_name='index.html'), name="index"),

path('suggest/', SearchSuggest.as_view(), name="suggest"),

path(r'^search/$', SearchView.as_view(), name="search"),
]

在django项目环境中安装pip install elasticsearch-dsl==5.2.0
search/models.py

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
from django.db import models

# Create your models here.
from datetime import datetime
from elasticsearch_dsl import DocType, Date, Nested, Boolean, \
analyzer, InnerObjectWrapper, Completion, Keyword, Text, Integer

from elasticsearch_dsl.analysis import CustomAnalyzer as _CustomAnalyzer

from elasticsearch_dsl.connections import connections

connections.create_connection(hosts=["localhost"])


class CustomAnalyzer(_CustomAnalyzer):
def get_analysis_definition(self):
return {}


ik_analyzer = CustomAnalyzer("ik_max_word", filter=["lowercase"])


class ArticleType(DocType):
# 伯乐在线文章类型
suggest = Completion(analyzer=ik_analyzer)
title = Text(analyzer="ik_max_word")
create_date = Date()
url = Keyword()
url_object_id = Keyword()
front_image_url = Keyword()
front_image_path = Keyword()
praise_nums = Integer()
comment_nums = Integer()
fav_nums = Integer()
tags = Text(analyzer="ik_max_word")
content = Text(analyzer="ik_max_word")

class Meta:
index = "jobbole"
doc_type = "article"


if __name__ == "__main__":
ArticleType.init()

这里先了解一下模糊搜索

fuzzy搜索(模糊搜索)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
GET jobbole/_search
{
"query": {
"fuzzy": {
"title": "linx"
}
},
"_source": ["title"]
}

GET jobbole/_search
{
"query": {
"fuzzy": {
"title": {
"value": "linx",
"fuzziness": 2,
"prefix_length": 0
}
}
},
"_source": ["title"]
}

fuzziness 编辑距离
prefix_length 前缀相同个数

搜索建议

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
POST jobbole/_search?pretty
{
"suggest": {
"my-suggest": {
"text":"linux",
"completion":{
"field":"suggest",
"fuzzy":{
"fuzziness":2
}
}
}
},
"_source": "title"
}

视图函数

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
class SearchSuggest(View):
def get(self, request):
key_words = request.GET.get('s','')
re_datas = []
if key_words:
s = ArticleType.search()
s = s.suggest('my_suggest', key_words, completion={
"field":"suggest", "fuzzy":{
"fuzziness":2
},
"size": 10
})
suggestions = s.execute_suggest()
for match in suggestions.my_suggest[0].options:
source = match._source
re_datas.append(source["title"])
return HttpResponse(json.dumps(re_datas), content_type="application/json")

运行爬虫,往es中写入一些数据

实现搜索功能

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
class SearchView(View):
def get(self, request):
key_words = request.GET.get("q","")

response = client.search(
index= "jobbole",
body={
"query":{
"multi_match":{
"query":key_words,
"fields":["tags", "title", "content"]
}
},
"size":10,
"highlight": {
"pre_tags": ['<span class="keyWord">'],
"post_tags": ['</span>'],
"fields": {
"title": {},
"content": {},
}
}
}
)


total_nums = response["hits"]["total"]
hit_list = []
for hit in response["hits"]["hits"]:
hit_dict = {}
if "title" in hit["highlight"]:
hit_dict["title"] = "".join(hit["highlight"]["title"])
else:
hit_dict["title"] = hit["_source"]["title"]
if "content" in hit["highlight"]:
hit_dict["content"] = "".join(hit["highlight"]["content"])[:500]
else:
hit_dict["content"] = hit["_source"]["content"][:500]

hit_dict["create_date"] = hit["_source"]["create_date"]
hit_dict["url"] = hit["_source"]["url"]
hit_dict["score"] = hit["_score"]

hit_list.append(hit_dict)

return render(request, "result.html", {
"all_hits":hit_list,
"key_words":key_words,
"total_nums":total_nums,
})

html页面进行填充

分页,我的搜索,热搜

spider保存item入es时,一并保存item个数,使用redis,
pip install redis

1
2
3
4
5
6
7
8
9
import redis
redis_cli = redis.StrictRedis()

class JobBoleArticleItem(scrapy.Item):
...
def save_to_es(self):
...

redis_cli.incr('jobbole_count')

分页

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
class SearchView(View):
def get(self, request):
key_words = request.GET.get("q","")
s_type = request.GET.get("s_type", "article")

# 热搜
redis_cli.zincrby("search_keywords_set", key_words)
topn_search = redis_cli.zrevrangebyscore("search_keywords_set", "+inf", "-inf", start=0, num=5)

# 分页
page = request.GET.get("p", "1")
try:
page = int(page)
except:
page = 1

jobbole_count = redis_cli.get("jobbole_count")
start_time = datetime.now()
response = client.search(
index= "jobbole",
body={
"query":{
"multi_match":{
"query":key_words,
"fields":["tags", "title", "content"]
}
},
"from":(page-1)*10,
"size":10,
"highlight": {
"pre_tags": ['<span class="keyWord">'],
"post_tags": ['</span>'],
"fields": {
"title": {},
"content": {},
}
}
}
)

end_time = datetime.now()
last_seconds = (end_time-start_time).total_seconds()
total_nums = response["hits"]["total"]
if (page%10) > 0:
page_nums = int(total_nums/10) +1
else:
page_nums = int(total_nums/10)
hit_list = []
for hit in response["hits"]["hits"]:
hit_dict = {}
if "title" in hit.get("highlight", ""):
hit_dict["title"] = "".join(hit["highlight"]["title"])
else:
hit_dict["title"] = hit["_source"]["title"]
if "content" in hit.get("highlight", ""):
hit_dict["content"] = "".join(hit["highlight"]["content"])[:400]+"...>"
else:
hit_dict["content"] = hit["_source"]["content"][:400]+"...>"

hit_dict["create_date"] = hit["_source"]["create_date"]
hit_dict["url"] = hit["_source"]["url"]
hit_dict["score"] = hit["_score"]

hit_list.append(hit_dict)

return render(request, "result.html", {"page":page,
"all_hits":hit_list,
"key_words":key_words,
"total_nums":total_nums,
"page_nums":page_nums,
"last_seconds":last_seconds,
"jobbole_count":jobbole_count,
"topn_search":topn_search})

1
2
3
4
5
6
7
8
9
10
11
12
13
var key_words = "{{ key_words }}"
//分页
$(".pagination").pagination({{ total_nums }}, {
current_page:{{ page|add:'-1' }}, //当前页码
items_per_page: 10,
display_msg: true,
callback: pageselectCallback
});

function pageselectCallback(page_id, jq) {
page_id = parseInt(page_id) + parseInt(1);
window.location.href = search_url + '?q=' + key_words + '&p=' + page_id
}

我的搜索

1
2
3
4
5
6
7
8
9
10
11
12
13
14
function add_search() {
var val = $(".searchInput").val();
if (val.length >= 2) {
//点击搜索按钮时,去重
KillRepeat(val);
//去重后把数组存储到浏览器localStorage
localStorage.search = searchArr;
//然后再把搜索内容显示出来
MapSearchArr();
}

window.location.href = search_url + '?q=' + val + "&s_type=" + $(".searchItem.current").attr('data-type')

}

-------------本文结束感谢您的阅读-------------
坚持原创技术分享,您的支持将鼓励我继续创作!
0%