I have a code that reads a lot of files in a loop from one location on disk and copy each file to another location. for each file copied, it also inserts a record (that contains the filename and other info) in a CopiedFiles table in a sqlserver db (using Entity Framework). The destination path must include a folder whose name include the id of the inserted CopiedFile (assigned by SqlServer at the moment of inserting), so I have to first insert into db, then copy the file in disk.
I have the File.Copy() inside a try/catch block, so if there is any problem when copying the file, it will just continue with next file. Of course, this creates a problem: in some cases, the input file can not be copied (for example corrupted file that throws an cyclic redundancy error when trying to copy it). In these cases, the db record has been inserted in the db but the file wont exist in the destination path in disk.
A simple solution would be to just delete the inserted record in db when the exception is thrown. But I would like to keep this code so that it only does inserts in db and not deletes...
So I was thinking another solution could be, before inserting the row in db, checking the file to make sure I will be able to copy it without errors. Now I am thinking possible ways to check this...
- one would be to actually copy the file to a temporary location, then if this goes ok, delete it from this temporary location, then saving row in db, then copy original file to destination. But of course, this would make the process twice as long (and the files to be copied are usually a lot, several thousands).
Could there be any other (more efficient) ways to check this?
4条答案
按热度按时间44u64gxh1#
You can do the database insert in a transaction and commit it when file copy is successful, and otherwise rollback. Something like this pseudo-code:
But there's a bit of a caveat i must mention, file copy operation might take some time, and while it's performed, you might be locking the table participating in the transaction which might affect other operations. So, it might be more prudent to actually delete the rows from the database instead of transacting.
xxls0lw82#
for example corrupted file that throws an cyclic redundancy error when trying to copy it
The standard file copying functionality doesn't do any CRC checks on copied data. That alone leads me to believe that file systems and streams in general are an unknown subject to you, so I will tailor my answer accordingly.
one would be to actually copy the file to a temporary location, then if this goes ok, delete it from this temporary location, then saving row in db, then copy original file to destination
Even if it didn't take longer, that doesn't guarantee anything. For example, if your actual destination file is opened by a program in a non-share mode, the temporary copy will go and the actual copy won't.
No, your system of inserting rows in your db before the copy happens is just broken. Insert your rows only after the file copies successfully. No record deletion necessary.
Also, you should store the failed files in an array somewhere and retry them after a few seconds, the #1 cause of file copying failures in my experience (besides files open in external editors) is antivirus applications keeping them open on demand, and you need to retry after they're done. Of course, if it fails a few times then feel free to give up.
The best you can do to check if a file might copy successfully is to just open the destination for writing. Since you don't want to lose its contents on a successful open (presumably?), you can open it in append mode. But this is only a heuristic, nothing but actually copying the file will tell you if the copy will succeed or not.
57hvy0tb3#
So I was thinking another solution could be, before inserting the row in db, checking the file to make sure I will be able to copy it without errors.
Nope. Don't check if the file will copy. There are all kinds of problems with that plan. * Wait to insert to the DB until afterwards and instead first check if the file has copied. Something more like this:
We could also perhaps simplify this by making the validate method throw instead of returning false:
But of course this opens the concern of using exception handling for logical program flow. It's only the correct way if failing validation really is, well, exceptional.
5q4ezhmt4#
You wont know if there are any issues until you try the copy, the main issues you will come up against are things like space, permissions and valid paths etc.
You could check the space, permissions, paths etc but these are all expensive operations.. however with reasonable assumptions on space and likely permissions etc (or basic validations) you could also try and copy a "known" small/test file to check if that works (you only need to do this for different locations periodically) and then copy larger files on that basis that works?
Apart from basic validation I would probably try all the operations and log / rollback any changes etc if there are failures.