haskell 为什么并行化这些代码在六核机器上几乎没有性能提升?

eqqqjvef  于 2023-02-04  发布在  其他
关注(0)|答案(1)|浏览(209)

我正在Haskell学习Simon Marlow的书中的并行编程。在关于并行化数独解算器的章节中,我决定使用回溯算法编写自己的解算器。问题是,当我尝试在6个内核中分配6个案例时,几乎没有性能增益。当我尝试用更多的情况来做示例时,我得到了更显著的性能提升,但仍然远远低于理论上的最大值,该值应该在5和6.我知道有些情况下运行速度可能会慢得多,但线程范围图显示没有理由获得如此小的收益。有人能解释一下我做错了什么吗?也许ST线程有一些我不理解的地方?
下面是代码:
Sudoku.hs

{-# LANGUAGE DeriveGeneric, DeriveAnyClass #-}

module Sudoku (getSudokus, solve) where

import Data.Vector(Vector, (!), generate, thaw, freeze)
import Data.List ( nub )
import qualified Data.Vector.Mutable as MV
import Text.Trifecta
import Control.Monad ( replicateM, when )
import Control.Applicative ((<|>))
import Control.Monad.ST
import Control.DeepSeq (NFData)
import GHC.Generics (Generic)

data Cell = Given Int
        | Filled Int
        | Empty
        deriving (Generic, NFData)

newtype Sudoku = Sudoku (Vector Cell)
    deriving (Generic, NFData)

instance Show Cell where
    show Empty = "   "
    show (Filled x) = " " ++ show x ++ " " 
    show (Given x) = "[" ++ show x ++ "]"

instance Show Sudoku where
    show (Sudoku vc) = "\n" ++
                    "+ -  -  - + -  -  - + -  -  - +" ++ "\n" ++
                    "|" ++ i 0 ++ i 1 ++ i 2 ++ "|" ++ i 3 ++ i 4 ++ i 5 ++ "|" ++ i 6 ++ i 7 ++ i 8 ++ "|" ++ "\n" ++
                    "|" ++ i 9 ++ i 10 ++ i 11 ++ "|" ++ i 12 ++ i 13 ++ i 14 ++ "|" ++ i 15 ++ i 16 ++ i 17 ++ "|" ++ "\n" ++
                    "|" ++ i 18 ++ i 19 ++ i 20 ++ "|" ++ i 21 ++ i 22 ++ i 23 ++ "|" ++ i 24 ++ i 25 ++ i 26 ++ "|" ++ "\n" ++
                    "+ -  -  - + -  -  - + -  -  - +" ++ "\n" ++
                    "|" ++ i 27 ++ i 28 ++ i 29 ++ "|" ++ i 30 ++ i 31 ++ i 32 ++ "|" ++ i 33 ++ i 34 ++ i 35 ++ "|" ++ "\n" ++
                    "|" ++ i 36 ++ i 37 ++ i 38 ++ "|" ++ i 39 ++ i 40 ++ i 41 ++ "|" ++ i 42 ++ i 43 ++ i 44 ++ "|" ++ "\n" ++
                    "|" ++ i 45 ++ i 46 ++ i 47 ++ "|" ++ i 48 ++ i 49 ++ i 50 ++ "|" ++ i 51 ++ i 52 ++ i 53 ++ "|" ++ "\n" ++
                    "+ -  -  - + -  -  - + -  -  - +" ++ "\n" ++
                    "|" ++ i 54 ++ i 55 ++ i 56 ++ "|" ++ i 57 ++ i 58 ++ i 59 ++ "|" ++ i 60 ++ i 61 ++ i 62 ++ "|" ++ "\n" ++
                    "|" ++ i 63 ++ i 64 ++ i 65 ++ "|" ++ i 66 ++ i 67 ++ i 68 ++ "|" ++ i 69 ++ i 70 ++ i 71 ++ "|" ++ "\n" ++
                    "|" ++ i 72 ++ i 73 ++ i 74 ++ "|" ++ i 75 ++ i 76 ++ i 77 ++ "|" ++ i 78 ++ i 79 ++ i 80 ++ "|" ++ "\n" ++
                    "+ -  -  - + -  -  - + -  -  - +" ++ "\n"
                    where i x = show (vc ! x)

parseSudoku :: Parser Sudoku
parseSudoku = do
            lst <- replicateM 81 field
            (newline *> return ()) <|> eof
            return $ Sudoku $ generate 81 (lst !!)
        where field = (char '.' >> return Empty) <|> (Given . read . return <$> digit)

getSudokus :: String -> Maybe [Sudoku]
getSudokus raw = case parseString (some parseSudoku) mempty raw of
                    Success ss -> Just ss
                    Failure _ -> Nothing

data Direction = Back | Forward

solve :: Sudoku -> Maybe Sudoku
solve sudoku@(Sudoku puzzle) =  if isValid sudoku then
                                Just $ runST $ do
                                    puzzle' <- thaw puzzle
                                    go puzzle' 0 Forward
                                    Sudoku <$> freeze puzzle'
                                else Nothing
                                where go _ 81 _ = return ()
                                    go vector position direction = do
                                        cell <- MV.read vector position
                                        case (cell, direction) of
                                            (Empty, Back) -> error "Calling back Empty cell, this should not ever occur"
                                            (Empty, Forward) -> MV.write vector position (Filled 1) >> go vector position Forward
                                            (Given _, Back) -> go vector (position-1) Back
                                            (Given _, Forward) -> go vector (position+1) Forward
                                            (Filled 10, Back) -> MV.write vector position Empty >> go vector (position-1) Back
                                            (Filled 10, Forward) -> go vector position Back
                                            (Filled x, Forward) -> do
                                                let (r, c, s) = calculatePositions position
                                                row <- getRowMV r vector
                                                col <- getColumnMV c vector
                                                sqr <- getSquareMV s vector
                                                if isUnique row && isUnique col && isUnique sqr
                                                    then go vector (position+1) Forward
                                                    else MV.write vector position (Filled (x+1)) >> go vector position Forward
                                            (Filled x, Back) -> MV.write vector position (Filled (x+1)) >> go vector position Forward 
                                    

calculatePositions :: Int -> (Int, Int, Int)
calculatePositions i = let (row, col) = divMod i 9
                        sqr = (row `div` 3)*3 + (col `div` 3)
                        in (row, col, sqr)

isValid :: Sudoku -> Bool
isValid sudoku = go 0
            where go 9 = True
                go i = isUnique (getRow i sudoku) && isUnique (getColumn i sudoku) && isUnique (getSquare i sudoku) && go (i+1)

getRow :: Int -> Sudoku -> [Cell]
getRow l (Sudoku vector) = go 0
            where go 9 = []
                go c = vector ! (l*9 + c) : go (c+1)

getRowMV :: MV.PrimMonad m => Int -> MV.MVector (MV.PrimState m) Cell -> m [Cell]
getRowMV l mv = go 0
            where go 9 = return []
                go c = do
                    n <- MV.read mv (l*9 + c)
                    rl <- go (c+1)
                    return (n:rl)

getColumn :: Int -> Sudoku -> [Cell]
getColumn c (Sudoku vector) = go 0
            where go 9 = []
                go i = vector ! (c + i*9) : go (i+1)

getColumnMV :: MV.PrimMonad m => Int -> MV.MVector (MV.PrimState m) Cell -> m [Cell]
getColumnMV c mv = go 0
            where go 9 = return []
                go i = do
                    n <- MV.read mv (c + i*9)
                    rl <- go (i+1)
                    return (n:rl)

getSquare :: Int -> Sudoku -> [Cell]
getSquare q (Sudoku vector) = let (y, x) = quotRem q 3
                                start = x*3 + y*3*9
                            in [ vector ! start, vector ! (start + 1), vector ! (start + 2)
                                , vector ! (start + 9), vector ! (start + 10), vector ! (start + 11)
                                , vector ! (start + 18), vector ! (start + 19), vector ! (start + 20)]

getSquareMV :: MV.PrimMonad m => Int -> MV.MVector (MV.PrimState m) a -> m [a]
getSquareMV q mv = let (y, x) = quotRem q 3
                    start = x*3 + y*3*9
                    in do
                        a1 <- MV.read mv start
                        a2 <- MV.read mv (start +  1)
                        a3 <- MV.read mv (start +  2)
                        b1 <- MV.read mv (start +  9)
                        b2 <- MV.read mv (start + 10)
                        b3 <- MV.read mv (start + 11)
                        c1 <- MV.read mv (start + 18)
                        c2 <- MV.read mv (start + 19)
                        c3 <- MV.read mv (start + 20)
                        return [a1,a2,a3,b1,b2,b3,c1,c2,c3]


isUnique :: [Cell] -> Bool
isUnique xs =  let sv = strip xs
                in length sv == length (nub sv)
            where strip (Empty:xs) = strip xs
                strip ((Given x):xs) = x : strip xs
                strip ((Filled x):xs) = x : strip xs
                strip [] = []

Main.hs

module Main where

import Control.Parallel.Strategies
import Control.Monad
import Control.DeepSeq ( force )
import Sudoku
import System.Environment (getArgs)

main :: IO ()
main = do
    filename <- head <$> getArgs
    contents <- readFile filename
    case getSudokus contents of
        Just sudokus -> print $ runEval $ do
            start <- forM sudokus (rpar . force . solve)
            forM start rseq
        Nothing -> putStrLn "Error during parsing"

我正在编译它与以下标志:
温室气体备选办法:- O2-rtsopts-线程-事件日志
使用以下标志执行
阴谋集团执行数独--数独17.6.txt + RTS-N1-s-l
给出了以下性能报告和线程范围图
在堆中分配了950,178,477,200字节
GC期间复制了181,465,696字节
121,832字节最大驻留时间(7个样本)
30,144字节最大斜率
使用中的总内存为7 MiB(0 MB由于碎片而丢失)
总时间(经过时间)平均暂停时间最大暂停时间
第0代227776个线圈,0个标准杆1.454s 1.633s 0.0000s 0.0011s
第1代7个线圈,0个标准差0.001秒0.001秒0.0001秒0.0002秒
任务:4(1个界限,3个高峰期工人(共3个),使用-N1)
Spark:6(0转换,0溢出,0哑弹,0 GC'd,6失败)
初始化时间0.001s(经过0.001s)
MUT时间220.452秒(经过220.037秒)
GC时间1.455s(经过1.634s)
退出时间0.000s(经过0.008s)

    • 总时间为221.908秒(经过221.681秒)**

分配速率4,310,140,685字节/MUT秒
总用户的99.3%,总耗时的99.3%

并行执行:
阴谋集团执行数独--数独17.6.txt + RTS-N6-s-l
在堆中分配了950,178,549,616字节
GC期间复制了325,450,104字节
142,704字节最大驻留时间(7个样本)
82,088字节最大斜率
正在使用的总内存为32 MiB(0 MB由于碎片而丢失)
总时间(经过时间)平均暂停时间最大暂停时间
第0代128677个线圈,128677个部件37.697秒30.612秒0.0002秒0.0035秒
第1代7个线圈,6个标准杆0.005s 0.004s 0.0006s 0.0012s
并行GC工作平衡:11.66%(连续0%,完美100%)
任务:14(1名上班族,13名高峰期工人(共13名),使用-N6)
Spark:6(5转换,0溢出,0哑弹,0 GC'd,1失败)
初始化时间0.010s(经过0.009s)
MUT时间355.227秒(经过184.035秒)
GC时间37.702秒(经过30.616秒)
退出时间0.001s(经过0.007s)

    • 总时间为392.940秒(经过214.667秒)**

分配速率2,674,847,755字节/MUT秒
生产效率占总用户的90.4%,占总用时的85.7%

以下是数独17.6.txt的内容
.......2143.......6........2.15..........637...........68...4.....23........7....
.......241..8.............3...4..5..7.....1......3.......51.6....2....5..3...7...
.......24....1...........8.3.7...1..1..8..5.....2......2.4...6.5...7.3...........
.......23.1..4....5........1.....4.....2...8....8.3.......5.16..4....7....3......
.......21...5...3.4..6.........21...8.......75.....6.....4..8...1..7.....3.......
.......215.3......6...........1.4.6.7.....5.....2........48.3...1..7....2........

slhcrj9b

slhcrj9b1#

信不信由你,你的问题可能与并行化无关。以后我建议你先看看你要并行化的函数的输入。结果你总是尝试一个谜题。

编辑- @Noughtmare指出,根据问题中发布的Threadscope结果,存在一些并行化。这是真的,这让我相信问题中发布的文件与用于创建结果的文件并不完全匹配。如果是这种情况,那么您可以跳到并行化部分来获得有关以下内容的答案:“为什么并行化这些代码在六核计算机上几乎不会产生性能提升?”

解析器

长话短说,您的解析器中有一个bug,如果您问我的真实意见,它实际上是trifecta包文档中的bug,因为它承诺完全消耗输入parseString
将字符串完全解析为结果。
但是它只消耗了第一行,并且成功地返回了结果。但是,老实说,我以前从来没有用过它,所以可能这是预期的行为。
让我们来看看你的解析器:

parseSudoku :: Parser Sudoku
parseSudoku = do
  lst <- replicateM 81 field
  (newline *> return ()) <|> eof
  return $ Sudoku $ generate 81 (lst !!)
  where
    field = (char '.' >> return Empty) <|> (Given . read . return <$> digit)

乍一看,它看起来还不错,直到仔细检查输入。数据行之间的每一个空行也包含一个换行符,但您的解析器最多需要一个:

.......2143.......6........2.15..........637...........68...4.....23........7....
<this is also a newline>
.......241..8.............3...4..5..7.....1......3.......51.6....2....5..3...7...

所以你的解析器应该是:

many (newline *> return ()) <|> eof

旁注:如果由我来决定,我会这样编写解析器:

parseSudoku :: Parser Sudoku
parseSudoku = do
  (Sudoku <$> V.replicateM 81 field) <* ((() <$ many newline) <|> eof)
  where
    field = (Empty <$ char '.') <|> (Given . Data.Char.digitToInt <$> digit)

并行化

当涉及到并行化的实现时,它似乎工作得很好,但问题是工作负载真的不平衡。这就是为什么使用6个内核时只有大约2倍的速度提升。换句话说,并不是所有谜题的难度都一样。因此,使用6个内核并行解决6个谜题,最多也只能获得最长解的性能。因此,要从并行化中获得更多好处,你要么需要更多的谜题,要么需要更少的CPU内核;)

编辑:下面是一些基准测试来支持我上面的解释。

以下是解决每个谜题的结果:

这两个分别是使用单核和六核的串行和并行求解器。

正如你所看到的,解决第二个带有索引1的谜题花费了最长的时间,在我的电脑上花费了100秒多一点。这也是并行算法解决所有谜题所花费的时间。这是有道理的,因为所有其他5个谜题都解决得更快,而且那些被释放的内核没有其他工作要做。
同样作为一个健全的检查,如果你总结了单独的时间,它需要解决的难题,它将匹配相当不错的总时间,它需要解决所有的顺序。

相关问题