使用二进制mapper.exe和reducer.exe在c#streaming mapreduce作业中获取文件名作为密钥

r6hnlfcb  于 2021-06-02  发布在  Hadoop
关注(0)|答案(3)|浏览(348)

下面的代码可以很好地将流作业提交到集群。

string statusFolderName = @"/tutorials/wordcountstreaming/status";

var jobcred = new BasicAuthCredential();
jobcred.UserName = "username";
jobcred.Password = "pass";
jobcred.Server = new Uri("https://something.azurehdinsight.net");

// Define the Hadoop streaming MapReduce job
StreamingMapReduceJobCreateParameters myJobDefinition = new StreamingMapReduceJobCreateParameters()
{
    JobName = "my word counting job",
    StatusFolder = statusFolderName,
    Input = "/example/data/gutenberg/davinci.txt",
    Output = "/tutorials/wordcountstreaming/output",
    Reducer = "wc.exe",
    Mapper = "cat.exe"

};

myJobDefinition.Files.Add("/example/apps/wc.exe");
myJobDefinition.Files.Add("/example/apps/cat.exe");

var jobClient = JobSubmissionClientFactory.Connect(jobcred);

// Run the MapReduce job
JobCreationResults mrJobResults = jobClient.CreateStreamingJob(myJobDefinition);
acruukt9

acruukt91#

In order to get the name of text file processed by Mapper as key you can use the below command in your mapper function.

    string Key = Environment.GetEnvironmentVariable("map_input_file"); 
Modify your Mapper code as:

        namespace wc
        {
            class wc
            {
                static void Main(string[] args)
                {
                    string line;
                    var count = 0;

                    if (args.Length > 0)
                    {
                        Console.SetIn(new StreamReader(args[0]));
                    }

                    while ((line = Console.ReadLine()) != null)
                    {
                        count += line.Count(cr => (cr == ' ' || cr == '\n'));
                    }
                     string Key = Environment.GetEnvironmentVariable("map_input_file"); 
                     var output = String.Format("{0}\t{1}",Key, count);
                     Console.WriteLine(output);
                }
            }
        }

希望这有帮助。

koaltpgm

koaltpgm2#


namespace wc
{
    class wc
    {
        static void Main(string[] args)
        {
            string line;
            var count = 0;

            if (args.Length > 0)
            {
                Console.SetIn(new StreamReader(args[0]));
            }

            while ((line = Console.ReadLine()) != null)
            {
                count += line.Count(cr => (cr == ' ' || cr == '\n'));
            }
            Console.WriteLine(count);
        }
    }
}

如何获取文本文件的名称作为密钥?我希望输出显示键值。关键字是文件名,值是文件中的字数我有多个文件。

相关问题