powershell 如何在一个文本文件中找到唯一行？

3z6pesqy 于 2022-12-29 发布在 Shell

关注(0)|答案(3)|浏览(148)

我有一个很大的哈希列表。我需要找出哪些只出现一次，因为大多数都是重复的。
例如：最后一行238db2.....只出现一次

ac6b51055fdac5b92934699d5b07db78
ac6b51055fdac5b92934699d5b07db78
7f5417a85a63967d8bba72496faa997a
7f5417a85a63967d8bba72496faa997a
1e78ba685a4919b7cf60a5c60b22ebc2
1e78ba685a4919b7cf60a5c60b22ebc2
238db202693284f7e8838959ba3c80e8

我尝试了以下方法，这些方法只是列出了每个double中的一个，而不仅仅是标识只出现一次的double

foreach ($line in (Get-Content "C:\hashes.txt" | Select-Object -Unique)) {
  Write-Host "Line '$line' appears $(($line | Where-Object {$_ -eq $line}).count) time(s)."
}

powershell

来源：https://stackoverflow.com/questions/74748715/how-to-find-unique-line-in-a-txt-file

3条答案

按热度按时间

k5ifujac1#

您可以使用Hashtable和StreamReader。
StreamReader逐行读取文件，Hashtable将该行存储为Key，并处于其Value状态$true（如果重复）或$false（如果唯一）

$reader = [System.IO.StreamReader]::new('D:\Test\hashes.txt')
$hash   = @{}
while($null -ne ($line = $reader.ReadLine())) {
    $hash[$line] = $hash.ContainsKey($line)
}

# clean-up the StreamReader
$reader.Dispose()

# get the unique line(s) by filtering for value $false
$result = $hash.Keys | Where-Object {-not $hash[$_]}

根据您的示例，$result将包含238db202693284f7e8838959ba3c80e8

赞(0）回复(0）举报 2022-12-29

0qx6xfy62#

考虑到您正在处理一个 * 大 * 文件，最好避免使用Get-Content。
带有-File参数的switch语句允许高效的逐行处理，并且假定重复项似乎已经“分组在一起”，则可以通过对相同行进行连续计数来检测它们。

$count = 0 # keeps track of the count of identical lines occurring in sequence
switch -File 'C:\hashes.txt' {
  default {
    if ($prevLine -eq $_ -or $count -eq 0) { # duplicate or first line.
      if ($count -eq 0) { $prevLine = $_ }
      ++$count 
    }
    else { # current line differs from the previous one.
      if ($count -eq 1) { $prevLine } # non-duplicate -> output
      $prevLine = $_
      $count = 1
    }
  }
}
if ($count -eq 1) { $prevLine } # output the last line, if a non-duplicate.

赞(0）回复(0）举报 2022-12-29

e0uiprwp3#

$values = Get-Content .\hashes.txt # Read the values from the hashes.txt file

$groups = $values | Group-Object | Where-Object { $_.Count -eq 1 } # Group the values by their distinct values and filter for groups with a single value

foreach ($group in $groups) {
    foreach ($value in $group.Values) {
        Write-Host "$value" # Output the value of each group
    }
}

要处理非常大的文件，您可以尝试以下操作。

$chunkSize = 1000 # Set the chunk size to 1000 lines
$lineNumber = 0  # Initialize a line number counter

# Use a do-while loop to read the file in chunks
do {
    # Read the next chunk of lines from the file
    $values = Get-Content .\hashes.txt | Select-Object -Skip $lineNumber -First $chunkSize

    # Group the values by their distinct values and filter for groups with a single value
    $groups = $values | Group-Object | Where-Object { $_.Count -eq 1 }

    foreach ($group in $groups) {
        foreach ($value in $group.Values) {
            Write-Host "$value" # Output the value of each group
        }
    }

    # Increment the line number counter by the chunk size
    $lineNumber += $chunkSize
} while ($values.Count -eq $chunkSize)

或者这个

# Create an empty dictionary
$dict = New-Object System.Collections.Hashtable

# Read the file line by line
foreach ($line in Get-Content .\hashes.txt) {
    # Check if the line is already in the dictionary
    if ($dict.ContainsKey($line)) {
        # Increment the value of the line in the dictionary
        $dict.Item($line) += 1
    } else {
        # Add the line to the dictionary with a count of 1
        $dict.Add($line, 1)
    }
}

# Filter the dictionary for values with a count of 1
$singles = $dict.GetEnumerator() | Where-Object { $_.Value -eq 1 }

# Output the values of the single items
foreach ($single in $singles) {
    Write-Host $single.Key
}

赞(0）回复(0）举报 2022-12-29

我来回答

powershell 如何在一个文本文件中找到唯一行？

3条答案

相关问题

热门标签

最新问答