如何使用go-routines在解析文本文件的URL与regex

dhxwm5r4 于 2023-01-10 发布在 Go

关注(0)|答案(1)|浏览(86)

我已经给了一个任务，在文本文件中搜索URL使用eng regex和goroutines与waitgroup的方式给定的方式：文本应该在N个工作者（goroutine）之间分配，每个goroutine搜索//https：//，等待组中的goroutine，最终结果应该是所有goroutine的一个字符串切片（URL）。
我正在研究一个txt.file，它在一个字符串中有几十个东西，但是现在我知道如何从文本中提取一个URL切片，但是不需要分割文本和goroutine...

import (
    "fmt"
    "os"
    "regexp"
    "sync"
    "time"
)

func Parser1(wg *sync.WaitGroup) {
    time.Sleep((1 * time.Second))
    b, err := os.ReadFile("repitations")
    if err != nil {
        fmt.Print(err)
    }

    str := string(b)

    re := regexp.MustCompile(`(?:https?://)?(?:[^/.]+\.)*google\.com(?:/[^/\s]+)*/?`)
    fmt.Printf("%q\n", re.FindAllString(str, -1))
    wg.Done()
}

func Parser2(wg *sync.WaitGroup) {
    time.Sleep((1 * time.Second))
    b, err := os.ReadFile("repitations")
    if err != nil {
        fmt.Print(err)
    }

    str := string(b)

    re := regexp.MustCompile(`(?:https?://)?(?:[^/.]+\.)*google\.com(?:/[^/\s]+)*/?`)
    fmt.Printf("%q\n", re.FindAllString(str, -1))
    wg.Done()
}
func main() {

    var wg sync.WaitGroup
    wg.Add(2)
    go Parser1(&wg)
    go Parser2(&wg)
    wg.Wait()
    fmt.Println("Well done!")
}````

regex

来源：https://stackoverflow.com/questions/75047778/how-to-use-go-routines-while-parsing-text-file-for-urls-with-regex

1条答案

按热度按时间

5vf7fwbs1#

拆分您的读取过程。
使用os.open（）打开文件，然后使用file.readAt（）顺序读取。
将读取长度和偏移量从开始传递到解析器（）

func Parser(wg *sync.WaitGroup, f *os.File, length int64, offset int64) {
    defer wg.Done()
    content := make([]byte, length)
    _, err := f.ReadAt(content, offset)
    if err != nil {
        log.Fatal(err)
    }
    log.Printf("%s", content)
    ....
}

赞(0）回复(0）举报 2023-01-10

我来回答

如何使用go-routines在解析文本文件的URL与regex

1条答案

相关问题

热门标签

最新问答