mysql Numbers of dimensions of vectors in Information Retrieval Vector Space Model?

qxgroojn  于 2022-12-22  发布在  Mysql
关注(0)|答案(1)|浏览(112)

Do the vectors in IR (document vector) have the length (# of dimensions) of the amount of all terms in my dictionary?
If so, let's imagine i have a big dictionary with 1000 terms and 10.000 of documents in my database. Aren't the vectors extremely big and do I really store 10.000 vectors with 1000 values each in my database. Is there are better way?
Like I said, this is one possibility but it sounds wrong to me. Can I even store vectors in a mysql database? I also read something about streams in C#

jckbn6z7

jckbn6z71#

Do the vectors in IR (document vector) have the length (# of dimensions) of the amount of all terms in my dictionary?
For a "classic" vector space model system, yes. From Vector Space Model :
Documents and queries are represented as vectors. Each dimension corresponds to a separate term.
For the newer "word embedding" vector, dimensionality reduction is frequently used (so the number of dimensions is less than the number of terms).
If so, let's imagine i have a big dictionary with 1000 terms and 10.000 of documents in my database. Aren't the vectors extremely big and do I really store 10.000 vectors with 1000 values each in my database.
Yes, you do.
Can I even store vectors in a mysql database?
I think it would be possible, but it would be difficult; retrieval would be very slow, as MySQL is not optimised for this.
Try using a specialised vector database, such as Weaviate (OSS), Pinecone (commercial), FAISS (OSS library), or one of the many others.

相关问题