我正在尝试运行一个阿拉伯语情感分类器代码,但当我运行该代码时,出现以下错误: java.io.IOException: (null) entry in command string: null chmod 0644 C:\Spark\spark-3.0.1-bin-hadoop2.7\bin\lrModel.model\metadata\_temporary\0\_temporary\attempt_20201212141604_0144_m_000000_0\part-00000
我搜索了一些解决方案,但它们都引用了hadoop和下载winutils.exe文件,但我已经添加了c:\hadoop\bin文件夹并将winutils.exe粘贴到其中。我已经在环境变量中添加了hadoop\u home和hadoop\u home%bin。这个案子还有别的解决办法吗?这是我的密码:
from pyspark import SparkConf, SparkContext
from pyspark.sql import SparkSession
from pyspark.sql.types import *
from pyspark.sql.functions import rand
from pyspark.ml import Pipeline
from pyspark.ml.feature import HashingTF, IDF, Tokenizer, StringIndexer
from pyspark.ml.classification import LogisticRegression
# initialize spark context and session
conf = SparkConf().setMaster("local").setAppName("TrainSentimentModel")
sc = SparkContext(conf = conf)
spark = SparkSession(sc)
# read the sentiment tsv file into an rdd and split it based upon tab
lines = sc.textFile("train_Arabic_tweets_20190413.tsv").map(lambda x: x.split("\t"))
# define the schema
schema = StructType([StructField("target", StringType(), True), StructField("tweet", StringType(), True)])
# create dataframe from rdd
docs = spark.createDataFrame(lines, schema)
# split the dataset into training and testing
(train_set, test_set) = docs.orderBy(rand()).randomSplit([0.8, 0.2], seed = 2000)
# define the processing pipeline (tokenize->tf->idf->label_indexer)
tokenizer = Tokenizer(inputCol="tweet", outputCol="words")
hashtf = HashingTF(inputCol="words", outputCol='tf')
idf = IDF(inputCol='tf', outputCol="features", minDocFreq=5) #minDocFreq: remove sparse terms
indexedLabel = StringIndexer(inputCol = "target", outputCol = "label")
pipeline = Pipeline(stages=[tokenizer, hashtf, idf, indexedLabel])
# apply the pipeline to the training and testing datasets
pipelineFit = pipeline.fit(train_set)
train_df = pipelineFit.transform(train_set)
test_df = pipelineFit.transform(test_set)
# initialize a logistic regression
lr = LogisticRegression(maxIter=100)
# train the classfier on the training dataset
lrModel = lr.fit(train_df)
# save the classifier
lrModel.save('lrModel.model')
# apply the model to tesing data
predictions = lrModel.transform(test_df)
# compute the test set accuracy
accuracy = predictions.filter(predictions.label == predictions.prediction).count() / float(test_set.count())
print("****************************************\n")
print("Test set accuracy " + str(accuracy) + "\n")
print("****************************************\n")
sc.stop()
这是我的截图:
暂无答案!
目前还没有任何答案,快来回答吧!