R函数序列号em:结构(.Call(“collapsedGibbsSampler”,文档,作为.integer(K))中的错误

y1aodyip  于 2023-03-05  发布在  其他
关注(0)|答案(1)|浏览(121)

当尝试运行slda.em函数时,我收到以下错误:
结构(.Call(“collapsedGibbsSampler”,文档,作为.integer(K),)中出错:单词(4297)必须为非负且少于单词数(4297)。
我不知道如何修复它,因为我所有的输入看起来完全一样的演示输入(slda)。
我有一个项目信息的数据框架,我想在项目描述上运行一个主题模型。

#below is an example dataframe with similar data with the sonnets standing in for project descriptions

sonnet1 <- c("Shall I compare thee to a summer's day? 
Thou art more lovely and more temperate:
Rough winds do shake the darling buds of May,
And summer's lease hath all too short a date:
Sometime too hot the eye of heaven shines,
And often is his gold complexion dimm'd;
And every fair from fair sometime declines,
By chance or nature's changing course untrimm'd;
But thy eternal summer shall not fade
Nor lose possession of that fair thou owest;
Nor shall Death brag thou wander'st in his shade,
When in eternal lines to time thou growest:
So long as men can breathe or eyes can see,
So long lives this and this gives life to thee.")

sonnet2 <-c("Let me not to the marriage of true minds
Admit impediments. Love is not love
Which alters when it alteration finds,
Or bends with the remover to remove:
O no! it is an ever-fixed mark
That looks on tempests and is never shaken;
It is the star to every wandering bark,
Whose worth's unknown, although his height be taken.
Love's not Time's fool, though rosy lips and cheeks
Within his bending sickle's compass come:
Love alters not with his brief hours and weeks,
But bears it out even to the edge of doom.
If this be error and upon me proved,
I never writ, nor no man ever loved.")

sonnet3 <- c("When to the sessions of sweet silent thought
I summon up remembrance of things past,
I sigh the lack of many a thing I sought,
And with old woes new wail my dear time's waste:
Then can I drown an eye, unused to flow,
For precious friends hid in death's dateless night,
And weep afresh love's long since cancell'd woe,
And moan the expense of many a vanish'd sight:
Then can I grieve at grievances foregone,
And heavily from woe to woe tell o'er
The sad account of fore-bemoaned moan,
Which I new pay as if not paid before.
But if the while I think on thee, dear friend,
All losses are restored and sorrows end.")

sonnet4 <- c("Full many a glorious morning have I seen
Flatter the mountain-tops with sovereign eye,
Kissing with golden face the meadows green,
Gilding pale streams with heavenly alchemy;
Anon permit the basest clouds to ride
With ugly rack on his celestial face,
And from the forlorn world his visage hide,
Stealing unseen to west with this disgrace:
Even so my sun one early morn did shine
With all triumphant splendor on my brow;
But out, alack! he was but one hour mine;
The region cloud hath mask'd him from me now.
Yet him for this my love no whit disdaineth;
Suns of the world may stain when heaven's sun staineth.")

sonnet5 <- c("That time of year thou mayst in me behold
When yellow leaves, or none, or few, do hang
Upon those boughs which shake against the cold,
Bare ruin'd choirs, where late the sweet birds sang.
In me thou seest the twilight of such day
As after sunset fadeth in the west,
Which by and by black night doth take away,
Death's second self, that seals up all in rest.
In me thou see'st the glowing of such fire
That on the ashes of his youth doth lie,
As the death-bed whereon it must expire
Consumed with that which it was nourish'd by.
This thou perceivest, which makes thy love more strong,
To love that well which thou must leave ere long.")

sonnet6 <- c("To me, fair friend, you never can be old,
For as you were when first your eye I eyed,
Such seems your beauty still. Three winters cold
Have from the forests shook three summers' pride,
Three beauteous springs to yellow autumn turn'd
In process of the seasons have I seen,
Three April perfumes in three hot Junes burn'd,
Since first I saw you fresh, which yet are green.
Ah! yet doth beauty, like a dial-hand,
Steal from his figure and no pace perceived;
So your sweet hue, which methinks still doth stand,
Hath motion and mine eye may be deceived:
For fear of which, hear this, thou age unbred;
Ere you were born was beauty's summer dead.")

sonnet_data <- data.frame(sonnet_number = c(1,2,3,4,5,6),
                          Annotation = c(1,2,2,1,1,2),
                          sonnet = c(sonnet1,sonnet2,sonnet3,sonnet4,sonnet5,sonnet6)) 

processed_sonnet <- textProcessor(sonnet_data$sonnet, metadata = sonnet_data,
                           lowercase = TRUE, removestopwords = TRUE, removenumbers = TRUE,
                           removepunctuation = TRUE, onlycharacter = TRUE)

out_sonnet <- prepDocuments(processed_sonnet$documents, 
                            processed_sonnet$vocab, 
                            processed_sonnet$meta, lower.thresh = 1)

sonnet_docs <- out_sonnet$documents
sonnet_words <- out_sonnet$vocab
sonnet_meta <- out_sonnet$meta
annot <- sonnet_meta$Annotation

topic_sonnet <- 3

sonnet_parameters <- sample(c(-10,10), topic_sonnet, replace = TRUE)

slda_sonnet <- slda.em(documents = sonnet_docs, 
                K = topic_sonnet, 
                vocab = sonnet_words,
                num.e.iterations = 10, 
                num.m.iterations = 4,
                alpha = 1.0, eta = 0.1,
                annotations = annot/100,
                params = sonnet_parameters, 
                variance = 0.25,
                lambda = 1.0, 
                logistic = FALSE, 
                method = "sLDA")

示例数据的输出生成相同的错误:
结构(.Call(“collapsedGibbsSampler”,文档,作为.integer(K),)中出错:单词(39)必须为非负且少于单词数(39)。

yxyvkwin

yxyvkwin1#

仔细阅读文件后......:
https://github.com/cran/lda/blob/master/R/slda.em.R
R函数序列号em:结构(. Call("collapsedGibbsSampler",文档,作为. integer(K))中的错误
...当文档中唯一单词的数量大于或等于词汇表中单词的数量时,就会出现此错误。我不理解,因为此错误只应出现在 * 大于 * 而不是 * 大于或等于 * 的情况下。
无论如何,我发现快速变通方法是简单地添加一个随机单词到您的词汇表.例如:

voc = c(your_original_vocab, 'randow_word_that_needs_to_exist_for_some_reason')

result <- slda.em(documents = docs,
                  K = 10,
                  vocab = voc,
                  num.e.iterations = 10,
                  num.m.iterations = 4,
                  alpha = 1.0, eta = 0.1,
                  annotations = metadata,
                  params = params,
                  variance = 0.25,
                  lambda = 1.0,
                  logistic = FALSE,
                  method = "sLDA")

相关问题