python—如何在大型查询表或任何其他主题建模资源上使用google cloud自然语言处理api?

rjee0c15  于 2021-09-08  发布在  Java
关注(0)|答案(0)|浏览(321)

正如标题中提到的,我有一个包含1800万行的bigquery表,其中近一半是无用的,我应该根据一个重要的列为每行分配一个主题/利基(该列包含关于产品和网站的详细信息),我已经在一个大小为10的样本数据上测试了nlp api,000,这确实很奇怪,但我的标准方法是迭代newarr(这是我通过查询bigquery表获得的重要细节列),在这里,我一次只发送一个单元格,等待api的响应并将其附加到结果数组。
理想情况下,我希望在最短的时间内对1800万行执行此操作,我的每分钟配额增加到3000个api请求,这是我可以完成的最大值,但我无法计算如何每分钟一个接一个地发送3000行。

for x in newarr:
    i += 1
    results.append(sample_classify_text(x))

示例分类文本是直接来自文档的函数


# this function will return category for the text

from google.cloud import language_v1

def sample_classify_text(text_content):
    """
    Classifying Content in a String

    Args:
      text_content The text content to analyze. Must include at least 20 words.
    """

    client = language_v1.LanguageServiceClient()

    # text_content = 'That actor on TV makes movies in Hollywood and also stars in a variety of popular new TV shows.'

    # Available types: PLAIN_TEXT, HTML
    type_ = language_v1.Document.Type.PLAIN_TEXT

    # Optional. If not specified, the language is automatically detected.
    # For list of supported languages:
    # https://cloud.google.com/natural-language/docs/languages
    language = "en"
    document = {"content": text_content, "type_": type_, "language": language}

    response = client.classify_text(request = {'document': document})
    #return response.categories
    # Loop through classified categories returned from the API
    for category in response.categories:
        # Get the name of the category representing the document.
        # See the predefined taxonomy of categories:
        # https://cloud.google.com/natural-language/docs/categories
        x = format(category.name)
        return x

        # Get the confidence. Number representing how certain the classifier
        # is that this category represents the provided text.

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题