在powershell的clickhouse docker容器中运行并行加载

iszxjhcz  于 2021-07-15  发布在  ClickHouse
关注(0)|答案(1)|浏览(477)

我正在尝试通过powershell加速将数据加载到windows 10 docker中托管的clickhouse,我想知道是否可以利用并行进程同时加载4个文件。我有兴趣得到一些帮助,以了解这是否是可能的,或一些指针如何接近。下面是我用来加载数据的当前脚本:

$files = Get-ChildItem "my_directory" | Sort-Object

foreach ($f in $files){
    $outfile = $f.FullName | Write-Host
    Get-Date | Write-Host    
    "Start loading" + $f.FullName | Write-Host
    `cat $f.FullName | docker run -i --rm --link ch:clickhouse-client yandex/clickhouse-client -m --host ch --query="INSERT INTO my_table FORMAT CSV"`
     Get-Date | Write-Host 
    "End loading" + $f.FullName | Write-Host
    [GC]::Collect()
}

我正在一个接一个地加载文件,我想一次加载4个。基于此链接:
在powershell中并行运行任务
我试着把代码放在一起,但需要一点帮助来看看我是否在正确的轨道上:


# I am assuming this is the code block of what to do

$block = {
    $outfile = $f.FullName | Write-Host
Get-Date | Write-Host    
"Start loading" + $f.FullName | Write-Host
`cat $f.FullName | docker run -i --rm --link ch:clickhouse-client yandex/clickhouse-client -m --host ch --query="INSERT INTO my_table FORMAT CSV"`
 Get-Date | Write-Host 
"End loading" + $f.FullName | Write-Host
[GC]::Collect())
}

# my directory of files

$files = Get-ChildItem "my_directory" | Sort-Object| Sort-Object

# Remove all jobs

Get-Job | Remove-Job
$MaxThreads = 4

# Start the jobs. Max 4 jobs running simultaneously.

foreach($f in $files){
    While ($(Get-Job -state running).count -ge $MaxThreads){
        Start-Sleep -Milliseconds 3
    }
    Start-Job -Scriptblock $Block -ArgumentList $f
}

# Wait for all jobs to finish.

While ($(Get-Job -State Running).count -gt 0){
    start-sleep 1
}

# Get information from each job.

foreach($job in Get-Job){
    $info= Receive-Job -Id ($job.Id)
}

# Remove all jobs created.

Get-Job | Remove-Job

新加入powershell,感谢您的帮助。

dgjrabp2

dgjrabp21#

我相信我解决了这个问题:


# create my block

$direc = "my_direc"
$block = {
    param([string]$file)
    `cat $direc/$file | docker run -i --rm --link ch:clickhouse-client yandex/clickhouse-client -m --host ch --query="INSERT INTO test FORMAT CSV"`  
     [GC]::Collect()
}

# Remove all jobs

Get-Job | Remove-Job
$MaxThreads = 4

# Start the jobs. Max 6 jobs running simultaneously.

foreach($file in $files){
    While ($(Get-Job -state running).count -ge $MaxThreads){
        Start-Sleep -Milliseconds 3
    }
    Start-Job -Scriptblock $Block -ArgumentList $file
}

# Wait for all jobs to finish.

While ($(Get-Job -State Running).count -gt 0){
    start-sleep 1
}

# Get information from each job.

foreach($job in Get-Job){
    $info= Receive-Job -Id ($job.Id)
}

# Remove all jobs created.

Get-Job | Remove-Job

相关问题