正如标题中提到的,我有一个包含1800万行的bigquery表,其中近一半是无用的,我应该根据一个重要的列为每行分配一个主题/利基(该列包含关于产品和网站的详细信息),我已经在一个大小为10的样本数据上测试了nlp api,000,这确实很奇怪,但我的标准方法是迭代newarr(这是我通过查询bigquery表获得的重要细节列),在这里,我一次只发送一个单元格,等待api的响应并将其附加到结果数组。
理想情况下,我希望在最短的时间内对1800万行执行此操作,我的每分钟配额增加到3000个api请求,这是我可以完成的最大值,但我无法计算如何每分钟一个接一个地发送3000行。
for x in newarr:
i += 1
results.append(sample_classify_text(x))
示例分类文本是直接来自文档的函数
# this function will return category for the text
from google.cloud import language_v1
def sample_classify_text(text_content):
"""
Classifying Content in a String
Args:
text_content The text content to analyze. Must include at least 20 words.
"""
client = language_v1.LanguageServiceClient()
# text_content = 'That actor on TV makes movies in Hollywood and also stars in a variety of popular new TV shows.'
# Available types: PLAIN_TEXT, HTML
type_ = language_v1.Document.Type.PLAIN_TEXT
# Optional. If not specified, the language is automatically detected.
# For list of supported languages:
# https://cloud.google.com/natural-language/docs/languages
language = "en"
document = {"content": text_content, "type_": type_, "language": language}
response = client.classify_text(request = {'document': document})
#return response.categories
# Loop through classified categories returned from the API
for category in response.categories:
# Get the name of the category representing the document.
# See the predefined taxonomy of categories:
# https://cloud.google.com/natural-language/docs/categories
x = format(category.name)
return x
# Get the confidence. Number representing how certain the classifier
# is that this category represents the provided text.
暂无答案!
目前还没有任何答案,快来回答吧!