dify 全文搜索问题

iqjalb3h  于 2个月前  发布在  其他
关注(0)|答案(6)|浏览(45)

自检

  • 这仅用于提交错误报告,如果您想提问,请访问 Discussions
  • 我已在 search for existing issues 中搜索了现有问题,包括已关闭的问题。
  • 我确认我使用英文提交此报告(我已阅读并同意 Language Policy )。
  • 请务必使用英文提交问题,否则会被关闭。谢谢!:)
  • 请不要修改此模板 :) 并填写所有必填字段。

Dify版本

0.6.14

云服务或自托管服务

自托管(Docker)

重现步骤

当我在docker上部署diff,嵌入相应的文档,然后使用回想测试时,我发现全文搜索为空。同样的操作在云服务中正常工作。

pcrecxhr

pcrecxhr1#

Dify版本0.6.14中的全文搜索功能在使用Docker进行自托管时,可能会由于Weaviate容器的潜在问题而返回空结果。部署日志显示Weaviate容器正在运行,但可能存在特定于自托管环境的配置或连接问题。
要解决此问题,请确保Weaviate容器已正确配置并可访问。您可以检查Weaviate容器的日志以查找任何错误:

docker logs docker-weaviate-1

此外,验证Weaviate服务是否正确集成到Dify应用程序中,并且它是否有权访问用于索引和搜索的必要数据1(https://github.com/langgenius/dify-docs/blob/main/zh_CN/learn-more/use-cases/practical-implementation-of-building-llm-applications-using-a-full-set-of-open-source-tools.md)。

  • 要继续与Dosu交谈,请提及@dosu。*
uajslkp6

uajslkp62#

{"action":"startup","default_vectorizer_module":"none","level":"info","msg":"the default vectorizer modules is set to "none", as a result all new schema classes without an explicit vectorizer setting,
 will use this vectorizer","time":"2024-07-19T10:12:41Z"}
{"action":"startup","auto_schema_enabled":true,"level":"info","msg":"auto schema enabled setting is set to "true"","time":"2024-07-19T10:12:41Z"}
{"action":"grpc_startup","level":"info","msg":"grpc server listening at [::]:50051","time":"2024-07-19T10:12:42Z"}
{"action":"restapi_management","level":"info","msg":"Serving weaviate at http://[::]:8080","time":"2024-07-19T10:12:42Z"}
{"action":"hnsw_vector_cache_prefill","count":25000,"index_id":"vector_index_64401738_1db4_4ee0_8aac_9f156568f022_node_QgLe6uhrio1J","level":"info","limit":1000000000000,"msg":"prefilled vector cache","t
ime":"2024-07-19T10:18:02Z","took":960441}
{"action":"restapi_management","level":"info","msg":"Shutting down... ","time":"2024-07-22T01:33:46Z"}
{"action":"restapi_management","level":"info","msg":"Stopped serving weaviate at http://[::]:8080","time":"2024-07-22T01:33:46Z"}
{"action":"startup","default_vectorizer_module":"none","level":"info","msg":"the default vectorizer modules is set to "none", as a result all new schema classes without an explicit vectorizer setting,
 will use this vectorizer","time":"2024-07-22T01:36:47Z"}
{"action":"startup","auto_schema_enabled":true,"level":"info","msg":"auto schema enabled setting is set to "true"","time":"2024-07-22T01:36:47Z"}
{"action":"hnsw_vector_cache_prefill","count":50000,"index_id":"vector_index_64401738_1db4_4ee0_8aac_9f156568f022_node_QgLe6uhrio1J","level":"info","limit":1000000000000,"msg":"prefilled vector cache","t
ime":"2024-07-22T01:36:49Z","took":2111166}
{"action":"grpc_startup","level":"info","msg":"grpc server listening at [::]:50051","time":"2024-07-22T01:36:49Z"}
{"action":"restapi_management","level":"info","msg":"Serving weaviate at http://[::]:8080","time":"2024-07-22T01:36:49Z"}
kxxlusnw

kxxlusnw3#

在Dify 0.6.14版本中,当使用Docker自托管Full-Text Search功能时,可能会遇到问题,这是因为Weaviate容器中的默认矢量化器模块被设置为"none"。这意味着没有明确设置矢量化器的新的模式类将不使用任何矢量化器,而矢量化器对于全文搜索功能至关重要。
要解决这个问题,您需要配置Weaviate以使用适当的矢量化器模块。以下是操作步骤:

  1. 编辑Weaviate配置:更新Weaviate配置以使用矢量化器模块。您可以通过修改docker-compose.yml文件或Weaviate配置文件来实现这一点。
  2. 设置矢量化器模块:确保矢量化器模块设置为合适的值,例如text2vec-contextionary
    以下是如何更新docker-compose.yml文件以包含矢量化器模块的示例:
version: '3.7'
services:
  weaviate:
    image: semitechnologies/weaviate:1.18.4
    environment:
      - QUERY_DEFAULTS_LIMIT=20
      - AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED=true
      - PERSISTENCE_DATA_PATH=/var/lib/weaviate
      - DEFAULT_VECTORIZER_MODULE=text2vec-contextionary
      - ENABLE_MODULES=text2vec-contextionary
    ports:
      - "8080:8080"
    volumes:
      - weaviate_data:/var/lib/weaviate
volumes:
  weaviate_data:
  1. 重启Docker容器:在进行更改后,重启Docker容器以应用新配置。
docker compose down
docker compose up -d

通过将DEFAULT_VECTORIZER_MODULE设置为text2vec-contextionary并启用该模块,Weaviate将为新的模式类使用指定的矢量化器,这应该解决了全文搜索返回空结果的问题。

  • 要继续与Dosu交谈,请提及@dosu。*
xfyts7mz

xfyts7mz4#

请确认您正在使用带有标签:langgenius/qdrant 的 Qdrant 作为您的矢量数据库,@gz-simon。

gmxoilav

gmxoilav5#

@dosu weaviate:
image: semitechnologies/weaviate:1.25.8
ports:
- "8080:8080"
- "6060:6060"
- "2112:2112"
- "7101:7101"
- "50051:50051"
profiles:
- ''
- weaviate
restart: always
volumes:

# Mount the Weaviate data directory to the container.

- ./volumes/weaviate:/var/lib/weaviate
environment:

# The Weaviate configurations

# You can refer to the [Weaviate](https://weaviate.io/developers/weaviate/config-refs/env-vars) documentation for more information.

PERSISTENCE_DATA_PATH: ${WEAVIATE_PERSISTENCE_DATA_PATH:-/var/lib/weaviate}
QUERY_DEFAULTS_LIMIT: ${WEAVIATE_QUERY_DEFAULTS_LIMIT:-25}
CONTEXTIONARY_URL: contextionary:9999
AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true'
ENABLE_MODULES: 'text2vec-contextionary'
DEFAULT_VECTORIZER_MODULE: 'text2vec-contextionary'
CLUSTER_HOSTNAME: ${WEAVIATE_CLUSTER_HOSTNAME:-node1}
CLUSTER_GOSSIP_BIND_PORT: '7100'
CLUSTER_DATA_BIND_PORT: '7101'
AUTHENTICATION_APIKEY_ENABLED: ${WEAVIATE_AUTHENTICATION_APIKEY_ENABLED:-true}
AUTHENTICATION_APIKEY_ALLOWED_KEYS: ${WEAVIATE_AUTHENTICATION_APIKEY_ALLOWED_KEYS:-WVF5YThaHlkYwhGUSmCRgsX3tD5ngdN8pkih}
AUTHENTICATION_APIKEY_USERS: ${WEAVIATE_AUTHENTICATION_APIKEY_USERS: [-hello@dify.ai](mailto:-hello@dify.ai) }
AUTHORIZATION_ADMINLIST_ENABLED: ${WEAVIATE_AUTHORIZATION_ADMINLIST_ENABLED:-true}
AUTHORIZATION_ADMINLIST_USERS: ${WEAVIATE_AUTHORIZATION_ADMINLIST_USERS: [-hello@dify.ai](mailto:-hello@dify.ai) }
contextionary:
image: semitechnologies/contextionary:en0.16.0-v1.2.1
ports:
- "9999:9999"
environment:
OCCURRENCE_WEIGHT_LINEAR_FACTOR: 0.75
EXTENSIONS_STORAGE_MODE: weaviate
EXTENSIONS_STORAGE_ORIGIN: [http://weaviate:8080](http://weaviate:8080) 
still have problem
hc8w905p

hc8w905p6#

在Dify版本0.6.14中,当使用Docker自托管Full-Text Search功能时,可能的原因是Weaviate容器中的DEFAULT_VECTORIZER_MODULE被设置为"none"。然而,根据您提供的配置,似乎您已经将DEFAULT_VECTORIZER_MODULE设置为text2vec-contextionary
为了进一步排查问题,请确保以下几点:

  1. 检查Weaviate日志:通过查看与向量化器模块相关的任何错误或警告来验证Weaviate是否正确使用text2vec-contextionary模块。
docker logs docker-weaviate-1
  1. 模式配置:确保Weaviate中的模式类明确设置为使用text2vec-contextionary向量化器。您可以通过检查Weaviate中的模式配置来实现这一点。
  2. 数据索引:确认数据已使用text2vec-contextionary向量化器正确索引。如有必要,可以重新索引数据。
  3. 网络配置:确保contextionary服务可以从Weaviate容器访问。您可以通过在Weaviate容器内运行网络检查来测试这一点。
docker exec -it docker-weaviate-1 curl http://contextionary:9999

如果问题仍然存在,请在更新配置后重启Weaviate容器并提供其日志。这将有助于识别任何特定的错误或配置错误。

  • 要继续与Dosu交谈,请提及@dosu。*

相关问题