Do the vectors in IR (document vector) have the length (# of dimensions) of the amount of all terms in my dictionary?
If so, let's imagine i have a big dictionary with 1000 terms and 10.000 of documents in my database. Aren't the vectors extremely big and do I really store 10.000 vectors with 1000 values each in my database. Is there are better way?
Like I said, this is one possibility but it sounds wrong to me. Can I even store vectors in a mysql database? I also read something about streams in C#
1条答案
按热度按时间jckbn6z71#
Do the vectors in IR (document vector) have the length (# of dimensions) of the amount of all terms in my dictionary?
For a "classic" vector space model system, yes. From Vector Space Model :
Documents and queries are represented as vectors. Each dimension corresponds to a separate term.
For the newer "word embedding" vector, dimensionality reduction is frequently used (so the number of dimensions is less than the number of terms).
If so, let's imagine i have a big dictionary with 1000 terms and 10.000 of documents in my database. Aren't the vectors extremely big and do I really store 10.000 vectors with 1000 values each in my database.
Yes, you do.
Can I even store vectors in a mysql database?
I think it would be possible, but it would be difficult; retrieval would be very slow, as MySQL is not optimised for this.
Try using a specialised vector database, such as Weaviate (OSS), Pinecone (commercial), FAISS (OSS library), or one of the many others.