Git如何计算文件哈希值？

dm7nw8vv 于 2023-08-01 发布在 Git

关注(0)|答案(6)|浏览(98)

存储在树对象中的SHA1哈希（由git ls-tree返回）与文件内容的SHA1哈希（由sha1sum返回）不匹配：

$ git cat-file blob 4716ca912495c805b94a88ef6dc3fb4aff46bf3c | sha1sum
de20247992af0f949ae8df4fa9a37e4a03d7063e  -

字符串
Git如何计算文件哈希值？它在计算哈希值之前是否压缩了内容？

Git

来源：https://stackoverflow.com/questions/7225313/how-does-git-compute-file-hashes

6条答案

按热度按时间

41zrol4v1#

Git在对象前面加上“blob”，然后是长度（作为一个人类可读的整数），最后是一个NUL字符
第一个月
来源：http://alblue.bandlem.com/2011/08/git-tip-of-week-objects.html

赞(0）回复(0）举报 2023-08-01

llmtgqce2#

我只是对@Leif Gruenwoldt的答案进行了扩展，并详细介绍了@Leif Gruenwoldt提供的reference中的内容

**自己动手 *

步骤1.在存储库中创建一个空文本文档（名称无关紧要）
步骤2.暂存和提交文档
步骤3.通过执行git ls-tree HEAD来识别blob的哈希值
步骤4.查找blob的哈希值为e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
第五步：从惊讶中恢复过来，阅读下面的内容
GIT如何计算其提交哈希

Commit Hash (SHA1) = SHA1("blob " + <size_of_file> + "\0" + <contents_of_file>)

字符串
文本blob⎵是一个常量前缀，\0也是一个常量，是NULL字符。<size_of_file>和<contents_of_file>因文件而异。
请参阅：What is the file format of a git commit object?
这就是所有的乡亲！

但是等等！，您是否注意到<filename>不是用于散列计算的参数？如果两个文件的内容是相同的，而不管它们的创建日期和时间以及它们的名称如何，则它们可能具有相同的哈希值。这也是Git比其他版本控制系统更好地处理移动和重命名的原因之一。
自己动手（分机）

步骤6.在同一目录中创建另一个具有不同filename的空文件
第7步：比较两个文件的哈希值。
备注：

该链接没有提到如何对tree对象进行哈希处理。我不确定算法和参数，但根据我的观察，它可能会根据它包含的所有blobs和trees（可能是它们的散列）计算散列

赞(0）回复(0）举报 2023-08-01

qkf9rpyu3#

一月一日

这是一种快速验证测试方法的方法：

s='abc'
printf "$s" | git hash-object --stdin
printf "blob $(printf "$s" | wc -c)\0$s" | sha1sum

字符串
输出量：

f2ba8f84ab5c1bce84a7b441cb1959cfc7093b7f
f2ba8f84ab5c1bce84a7b441cb1959cfc7093b7f  -

型
其中sha1sum位于GNU核心应用程序中。
然后，它归结为理解每种对象类型的格式。我们已经讨论了blob这个小问题，下面是其他问题：

commit：git commit对象的文件格式是什么？。
树：What is the internal format of a git tree object?
标签：How is a Git Tag Object SHA1 Created?

赞(0）回复(0）举报 2023-08-01

hk8txs484#

我在Python 3中的一些单元测试中需要它，所以我想我应该把它留在这里。

def git_blob_hash(data):
    if isinstance(data, str):
        data = data.encode()
    data = b'blob ' + str(len(data)).encode() + b'\0' + data
    h = hashlib.sha1()
    h.update(data)
    return h.hexdigest()

字符串
我在所有地方都坚持使用\n行尾，但在某些情况下，Git可能会在计算这个哈希值之前更改行尾，因此您可能也需要使用.replace('\r\n', '\n')。

赞(0）回复(0）举报 2023-08-01

e7arh2l65#

基于Leif Gruenwoldt的答案，下面是git hash-object的shell函数替代：

git-hash-object () { # substitute when the `git` command is not available
    local type=blob
    [ "$1" = "-t" ] && shift && type=$1 && shift
    # depending on eol/autocrlf settings, you may want to substitute CRLFs by LFs
    # by using `perl -pe 's/\r$//g'` instead of `cat` in the next 2 commands
    local size=$(cat $1 | wc -c | sed 's/ .*$//')
    ( echo -en "$type $size\0"; cat "$1" ) | sha1sum | sed 's/ .*$//'
}

字符串
试验项目：

$ echo 'Hello, World!' > test.txt
$ git hash-object test.txt
8ab686eafeb1f44702738c8b0f24f2567c36da6d
$ git-hash-object test.txt
8ab686eafeb1f44702738c8b0f24f2567c36da6d

型

赞(0）回复(0）举报 2023-08-01

jvlzgdj96#

这是一个用于二进制哈希计算的python3版本（上面的例子是针对文本的）
为了便于阅读，请将此代码放在您自己的def中。另请注意，代码是一个片段，而不是完整的脚本。给你灵感。

targetSize: int
exists: bool
if os.path.exists(targetFile):
    exists = True
    targetSize = os.path.getsize(targetFile)
else:
    exists = False
    targetSize = 0
openMode: str
if exists:
    openMode = 'br+'
else:
    openMode = 'bw+'
with open(targetFile, openMode) as newfile:
    if targetSize > 0:
        header: str = f"blob {targetSize}\0"
        headerBytes = header.encode('utf-8')
        headBytesLen = len(headerBytes)
        buffer = bytearray(headBytesLen + targetSize)
        buffer[0:0+headBytesLen] = headerBytes
        buffer[headBytesLen:headBytesLen+targetSize] = newfile.read()
        sha1Hash = hashlib.sha1(buffer).hexdigest()
        if not sha == sha1Hash:
            newfile.truncate()
        else:
            continue
    with requests.get(fullFile) as response2:            
        newfile.write(response2.content)

字符串

赞(0）回复(0）举报 2023-08-01

我来回答

Git如何计算文件哈希值？

6条答案

相关问题

热门标签

最新问答