powershell 如何在一个文本文件中找到唯一行?

3z6pesqy  于 2022-12-29  发布在  Shell
关注(0)|答案(3)|浏览(149)

我有一个很大的哈希列表。我需要找出哪些只出现一次,因为大多数都是重复的。
例如:最后一行238db2.....只出现一次

ac6b51055fdac5b92934699d5b07db78
ac6b51055fdac5b92934699d5b07db78
7f5417a85a63967d8bba72496faa997a
7f5417a85a63967d8bba72496faa997a
1e78ba685a4919b7cf60a5c60b22ebc2
1e78ba685a4919b7cf60a5c60b22ebc2
238db202693284f7e8838959ba3c80e8

我尝试了以下方法,这些方法只是列出了每个double中的一个,而不仅仅是标识只出现一次的double

foreach ($line in (Get-Content "C:\hashes.txt" | Select-Object -Unique)) {
  Write-Host "Line '$line' appears $(($line | Where-Object {$_ -eq $line}).count) time(s)."
}
k5ifujac

k5ifujac1#

您可以使用Hashtable和StreamReader。
StreamReader逐行读取文件,Hashtable将该行存储为Key,并处于其Value状态$true(如果重复)或$false(如果唯一)

$reader = [System.IO.StreamReader]::new('D:\Test\hashes.txt')
$hash   = @{}
while($null -ne ($line = $reader.ReadLine())) {
    $hash[$line] = $hash.ContainsKey($line)
}

# clean-up the StreamReader
$reader.Dispose()

# get the unique line(s) by filtering for value $false
$result = $hash.Keys | Where-Object {-not $hash[$_]}

根据您的示例,$result将包含238db202693284f7e8838959ba3c80e8

0qx6xfy6

0qx6xfy62#

  • 考虑到您正在处理一个 * 大 * 文件,最好避免使用Get-Content
  • 带有-File参数的switch语句允许高效的逐行处理,并且假定重复项似乎已经“分组在一起”,则可以通过对相同行进行连续计数来检测它们。
$count = 0 # keeps track of the count of identical lines occurring in sequence
switch -File 'C:\hashes.txt' {
  default {
    if ($prevLine -eq $_ -or $count -eq 0) { # duplicate or first line.
      if ($count -eq 0) { $prevLine = $_ }
      ++$count 
    }
    else { # current line differs from the previous one.
      if ($count -eq 1) { $prevLine } # non-duplicate -> output
      $prevLine = $_
      $count = 1
    }
  }
}
if ($count -eq 1) { $prevLine } # output the last line, if a non-duplicate.
e0uiprwp

e0uiprwp3#

$values = Get-Content .\hashes.txt # Read the values from the hashes.txt file

$groups = $values | Group-Object | Where-Object { $_.Count -eq 1 } # Group the values by their distinct values and filter for groups with a single value

foreach ($group in $groups) {
    foreach ($value in $group.Values) {
        Write-Host "$value" # Output the value of each group
    }
}

要处理非常大的文件,您可以尝试以下操作。

$chunkSize = 1000 # Set the chunk size to 1000 lines
$lineNumber = 0  # Initialize a line number counter

# Use a do-while loop to read the file in chunks
do {
    # Read the next chunk of lines from the file
    $values = Get-Content .\hashes.txt | Select-Object -Skip $lineNumber -First $chunkSize

    # Group the values by their distinct values and filter for groups with a single value
    $groups = $values | Group-Object | Where-Object { $_.Count -eq 1 }

    foreach ($group in $groups) {
        foreach ($value in $group.Values) {
            Write-Host "$value" # Output the value of each group
        }
    }

    # Increment the line number counter by the chunk size
    $lineNumber += $chunkSize
} while ($values.Count -eq $chunkSize)

或者这个

# Create an empty dictionary
$dict = New-Object System.Collections.Hashtable

# Read the file line by line
foreach ($line in Get-Content .\hashes.txt) {
    # Check if the line is already in the dictionary
    if ($dict.ContainsKey($line)) {
        # Increment the value of the line in the dictionary
        $dict.Item($line) += 1
    } else {
        # Add the line to the dictionary with a count of 1
        $dict.Add($line, 1)
    }
}

# Filter the dictionary for values with a count of 1
$singles = $dict.GetEnumerator() | Where-Object { $_.Value -eq 1 }

# Output the values of the single items
foreach ($single in $singles) {
    Write-Host $single.Key
}

相关问题