tokenizers RefMutContainer是不稳固的

lvmkulzt  于 4个月前  发布在  其他
关注(0)|答案(3)|浏览(87)

由于我今天发现了这个代码库中的不安全之处(参见 #1485 ),我决定仔细检查一下。
我发现 RefMutContainer 会擦除生命周期,从而导致更不安全的代码。
源代码位置:
tokenizers/bindings/python/src/utils/mod.rs
第44行到第66行 in 14a07b0
| | #[derive(Clone)] |
| | pubstructRefMutContainer{ |
| | inner:Arc<Mutex<Option<*mutT>>>, |
| | } |
| | implRefMutContainer{ |
| | pubfnnew(content:&mutT) -> Self{ |
| | Self{ |
| | inner:Arc::new(Mutex::new(Some(content))), |
| | } |
| | } |
| | |
| | pubfnmap<F:FnOnce(&T) -> U,U>(&self,f:F) -> Option{ |
| | let lock = self.inner.lock().unwrap(); |
| | let ptr = lock.as_ref()?; |
| | Some(f(unsafe{ ptr.as_ref().unwrap()})) |
| | } |
| | |
| | pubfnmap_mut<F:FnOnce(&mutT) -> U,U>(&mutself,f:F) -> Option{ |
| | let lock = self.inner.lock().unwrap(); |
| | let ptr = lock.as_ref()?; |
| | Some(f(unsafe{ ptr.as_mut().unwrap()})) |
| | } |
| | } |
注意,我没有检查这里是否真的存在不安全之处。然而,由于这里的安全性无法在本地保证,这些方法至少应该被标记为 unsafe ,但更好的做法是修复它们以使其正确。
下面是一个示例代码,当使用 Miri 检查时会显示未定义的行为:

let container = {
    let mut content = String::from("foo");
    RefMutContainer::new(&mut content)
};

container.map(|text| {
  // Triggers UB
  println!("{text}");
});

链接到Playground:
https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=fede2ac43a1fdb8b5e499b505e9ecdea
Miri产生的错误:

error: Undefined Behavior: out-of-bounds pointer use: alloc1256 has been freed, so this pointer is dangling
   --> /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/slice/raw.rs:138:9
    |
138 |         &*ptr::slice_from_raw_parts(data, len)
    |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ out-of-bounds pointer use: alloc1256 has been freed, so this pointer is dangling
    |
    = help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
    = help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information
help: alloc1256 was allocated here:
   --> src/main.rs:29:20
    |
29  |     let mut text = String::from("foo");
    |                    ^^^^^^^^^^^^^^^^^^^
help: alloc1256 was deallocated here:
   --> src/main.rs:31:5
    |
31  |     drop(text);
    |     ^^^^^^^^^^
    = note: BACKTRACE (of the first span):
    = note: inside `std::slice::from_raw_parts::<'_, u8>` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/slice/raw.rs:138:9: 138:47
    = note: inside `<std::vec::Vec<u8> as std::ops::Deref>::deref` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/vec/mod.rs:2831:18: 2831:64
    = note: inside `<std::string::String as std::ops::Deref>::deref` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/string.rs:2487:43: 2487:52
    = note: inside `<std::string::String as std::fmt::Display>::fmt` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/string.rs:2376:28: 2376:34
    = note: inside `<&std::string::String as std::fmt::Display>::fmt` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/fmt/mod.rs:2377:62: 2377:82
    = note: inside `core::fmt::rt::Argument::<'_>::fmt` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/fmt/rt.rs:173:76: 173:95
    = note: inside `std::fmt::write` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/fmt/mod.rs:1178:21: 1178:44
    = note: inside `<std::io::StdoutLock<'_> as std::io::Write>::write_fmt` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/io/mod.rs:1823:15: 1823:43
    = note: inside `<&std::io::Stdout as std::io::Write>::write_fmt` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/io/stdio.rs:786:9: 786:36
    = note: inside `<std::io::Stdout as std::io::Write>::write_fmt` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/io/stdio.rs:760:9: 760:33
    = note: inside `std::io::stdio::print_to::<std::io::Stdout>` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/io/stdio.rs:1116:21: 1116:47
    = note: inside `std::io::_print` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/io/stdio.rs:1226:5: 1226:37
note: inside closure
   --> src/main.rs:35:7
    |
35  |       println!("{text}");
    |       ^^^^^^^^^^^^^^^^^^
note: inside `RefMutContainer::<std::string::String>::map::<{closure@src/main.rs:33:19: 33:25}, ()>`
   --> src/main.rs:18:14
    |
18  |         Some(f(unsafe { ptr.as_ref().unwrap() }))
    |              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside `main`
   --> src/main.rs:33:5
    |
33  | /     container.map(|text| {
34  | |       // Triggers UB
35  | |       println!("{text}");
36  | |     });
    | |______^
    = note: this error originates in the macro `println` (in Nightly builds, run with -Z macro-backtrace for more info)

note: some details are omitted, run with `MIRIFLAGS=-Zmiri-backtrace=full` for a verbose backtrace
xhv8bpkk

xhv8bpkk1#

Similar reproduction as above, but just showing that it can easily be even more innocent than an explicit drop

let container = {
    let mut text = String::from("foo");
    RefMutContainer::new(&mut text)
};
a8jjtwal

a8jjtwal2#

类似于上面的复制,但只是表明它可以轻松地甚至比一个明确的 drop

let container = {
    let mut text = String::from("foo");
    RefMutContainer::new(&mut text)
};

哈哈,竞争条件!
我只是用你的建议更新了它 :)

db2dz4w8

db2dz4w83#

仔细观察这个,我想不出任何情况下这不是

  • 非常不健全的或者
  • 不能仅仅在内部使用引用。

这个结构体应该要么

  • 在整个过程中使用原始指针并且永远不解引用它们
  • 在内部存储一个 &mut T 而不是一个 *mut T

相关问题