fs.hdfs.impl.disable.cache导致sparksql非常慢

44u64gxh 于 2021-05-29 发布在 Hadoop

关注(0)|答案(0)|浏览(767)

这是一个与此问题相关的问题：hive/hadoop间歇性故障：无法将源移动到目标
我们发现我们可以避免“无法移动源。。。通过设置“关闭文件系统” fs.hdfs.impl.disable.cache 至 true 然而，我们也注意到sparksql查询变得非常慢——过去在几秒钟内完成的查询现在需要30到40秒才能完成（即使查询非常简单，比如读取一个小表）。
这正常吗？
我对 fs.hdfs.impl.disable.cache 真实意味着 FileSystem#get() 会一直 createFileSystem() 而不是返回缓存的 FileSystem . 此设置可防止 FileSystem 对象被多个客户机共享，这确实是有意义的，因为它可以防止，例如，两个 FileSystem#get() 关闭彼此的文件系统。
（例如，请参阅此讨论）
这种设置会减慢速度，但可能不会太慢。
来自：hadoop源代码阅读

/**
 * Returns the FileSystem for this URI's scheme and authority. The scheme of
 * the URI determines a configuration property name,
 * <tt>fs.<i>scheme</i>.class</tt> whose value names the FileSystem class.
 * The entire URI is passed to the FileSystem instance's initialize method.
 */
public static FileSystem get(URI uri, Configuration conf)
        throws IOException {
    String scheme = uri.getScheme();
    String authority = uri.getAuthority();

    if (scheme == null) { // no scheme: use default FS
        return get(conf);
    }

    if (authority == null) { // no authority
        URI defaultUri = getDefaultUri(conf);
        if (scheme.equals(defaultUri.getScheme()) // if scheme matches
                // default
                && defaultUri.getAuthority() != null) { // & default has
            // authority
            return get(defaultUri, conf); // return default
        }
    }

    String disableCacheName = String.format("fs.%s.impl.disable.cache",
            scheme);
    if (conf.getBoolean(disableCacheName, false)) {
        return createFileSystem(uri, conf);
    }

    return CACHE.get(uri, conf);
}

这种缓慢是否会指向其他一些网络问题，比如解决域名问题？欢迎对这个问题有任何见解。

hadoop Hive hdfs apache-spark-sql

来源：https://stackoverflow.com/questions/48652977/fs-hdfs-impl-disable-cache-caused-sparksql-very-slow

暂无答案！

目前还没有任何答案，快来回答吧！

我来回答

fs.hdfs.impl.disable.cache导致sparksql非常慢

暂无答案！

相关问题

热门标签

最新问答