Talk:Make a backup file: Difference between revisions

→‎Atomicity: new section
(→‎Atomicity: new section)
 
(14 intermediate revisions by 7 users not shown)
Line 15:
:::::: Ok, but one of those cases would be an interpreter which was designed to be portable across a variety of platforms. Here, you might have a core that gets you running and then everything else is done in the interpreter. That said, I can see an argument for providing special case support for libc on unix platforms. --[[User:Rdm|Rdm]] 17:53, 10 November 2011 (UTC)
::::::: as i said above, if the core language implements rename/move using external commands then those commands become a direct dependency of the language and are ok to use. presumably such a language will rely on external commands for other things as well and thus it doesn't make much sense to avoid one and leave the others. in a situation where external commands are not allowed by policy, such a language would not be usable anyways. the limitations should only apply to languages where a portable method is not already available and different options could be chosen. in that case the choice should be made according to the restrictions given.--[[User:EMBee|eMBee]] 03:11, 11 November 2011 (UTC)
:::::::: Note that "rename is atomic" [http://stackoverflow.com/questions/167414/is-an-atomic-file-rename-with-overwrite-possible-on-windows assumes unix] (or maybe a recent version of windows and an appropriate file system). --[[User:Rdm|Rdm]] 14:14, 14 November 2011 (UTC)
::::::::: true, but it is only stated as an advantage not a requirement for this task. even without being atomic rename is cheaper and thus less likely to fail...--[[User:EMBee|eMBee]] 14:42, 14 November 2011 (UTC)
:::::::::: Note that this still assumes unix -- here's some examples illustrating this point: http://stackoverflow.com/questions/7147577/programmatically-rename-open-file-on-windows and http://stackoverflow.com/questions/1261269/how-to-open-file-in-windows-while-not-blocking-its-renaming --[[User:Rdm|Rdm]] ([[User talk:Rdm|talk]]) 21:33, 17 May 2013 (UTC)
 
== why external commands are bad ==
Line 20 ⟶ 23:
the motivation to avoid external commands can be illustrated by an experience i had just recently:
on a website a framework uses cvs to manage changes to the contents. yesterday i wanted to add something to that site, and i was presented with this error: ''fork() failed with ENOMEM. Out of memory?''. draw your own conclusions...--[[User:EMBee|eMBee]] 03:18, 11 November 2011 (UTC)
 
== Why no copying? ==
 
Backup involves copying, and must do since otherwise it is the same file and will be modified by the subsequent update. (Or alternatively it has to have some very special support from the OS; there's no POSIX operation for “checkpoint this file to this other name without copying” IIRC.) The whole strength of backups comes from copying. –[[User:Dkf|Donal Fellows]] 15:42, 11 November 2011 (UTC)
: This depends on the OS and on the pattern of accesses applications use on the file. Under unix, if anything has the file open for writing, then renaming it means they will update the backup.
:: this is of course a concern, but only if multiple processes deal with a file which is not the concern of this task. also if a copy of a file is made while another process writes to it the the problems are not any less.--[[User:EMBee|eMBee]] 16:06, 11 November 2011 (UTC)
: But if everything uses the "rename and write new copy" system, then it can be safe (though, of course, there's also the issue of more recent backups overwriting older backups). --[[User:Rdm|Rdm]] 15:48, 11 November 2011 (UTC)
:It seems faster to just rename the file. With copying it goes like this: create the new file (.backup), copy the contents of the old file to the new file, clear the old file, write new data to the old file. Without it goes like this: rename the old file to a new name (.backup), create the old file again (already empty), write new data to the newly created file (with the old name). --[[User:Mwn3d|Mwn3d]] 15:52, 11 November 2011 (UTC)
:good question. thanks for asking. copying is more expensive than rename. copying can fail (due to lack of space for example). if the machine dies before the copied file is written to the disk, which may be some time after the OS signaled that the copy is complete, and you already started to write the to the old file, then both may be lost. rename guarantees that the data is not touched, and thus can hardly be corrupted. and i don't think a rename could cause a file to be deleted if the machine crashes while a rename happens. it's either got the old name or the new one (or in very obscure situations maybe both). as far as i can tell, <code>rename()</code> is posix. at least the <code>rename(2)</code> manpage makes that claim. it is atomic too...--[[User:EMBee|eMBee]] 16:06, 11 November 2011 (UTC)
 
== No existing file ==
 
"Some examples on this page assume that the original file already exists. They might fail if some user is trying to create a new file." So, is it a task requirement that solutions should simply create a new file if there is no existing file? That is not something I would read into "In this task you should create a backup file from an existing file..." If this case is desired it should be added as one of the bullet points. &mdash;[[User:Sonia|Sonia]] 23:11, 16 February 2012 (UTC)
:if the file does not exist, it should not be created. it would be nice if the code would fail gracefully if the file is missing, but i don't think this is necessary. it is just a code snippet to solve a particular problem. i'd expect developers to adapt the code if their situation is slightly different. [[Ensure that a file exists]] solves that part for example. no need to repeat it here.--[[User:EMBee|eMBee]] 03:07, 17 February 2012 (UTC)
::Oh good. I'll change the solutions I just posted. &mdash;[[User:Sonia|Sonia]] 03:13, 17 February 2012 (UTC)
 
== Follow symlinks ==
 
FWIW, following symlinks seems like a really bad idea. It's fine in the context of something like Emacs (which sounds like a possible motivation) with a feature to visit files under their "real" names, but in those cases the user is usually aware of the new name via the UI (as in getting a different buffer name). But for a script this is just wrong, since you get a script which works in a way that can change in the presence of symlinks -- and the whole point of symlinks is to get things to work even when a file is elsewhere. I think that it would be better to simplify this by ignoring symlinks completely, and introduce a separate task for resolving symlinks. --[[User:Elibarzilay|Elibarzilay]] ([[User talk:Elibarzilay|talk]]) 21:01, 17 May 2013 (UTC)
 
::Indeed. A simple application reading/writing files should not normally care (or check) if they are reading/writing via symlinks. The Go code for example is broken since it blindly assumes that any symlink doesn't point to another symlink. There are far too many ways to screw it up unless you really know what you're doing and you really understand symlinks (and how any specific user might choose to use them and want them to behave). IMO it shouldn't be an applications job to make file backups at all (except perhaps as an optional "feature" of an editor or some such; and for example editors like vim have a lot of options related to this so it will do what a user wants; assuming you can blindly lookup where a symlink points to and mess around in that directory is just bad). &mdash;[[User:dchapes|dchapes]] ([[User talk:dchapes|talk]] | [[Special:Contributions/dchapes|contribs]]) 14:00, 6 September 2014 (UTC)
 
== Atomicity ==
 
After coming back to this task... the requirements (stated requirements and to some degree implied requirements) stumble over the OS's support for atomic operations on a file system.
 
If atomicity is not an issue (if it's understood that the backup process may produce unintended consequences when some other mechanism is manipulating one or more of the path names being used to "backup" the file), the task is fairly straightforward.
 
If it is an issue, then all sorts of problems arise (for example, the file in question is on a network file system ...).
 
In a "real life" context this requires some sort of external attention (and redundancy -- backups being just one form of redundancy) to catch and recover from the occasional failures. Depending on the context, we wind up with quite a variety of cost/benefit issues.
 
So this winds up being a "best effort" problem, and many of the details are more about the underlying OS and hardware than about the language. It's an interesting problem. (But it's not a great fit as a rosettacode task.) --[[User:Rdm|Rdm]] ([[User talk:Rdm|talk]]) 10:29, 19 July 2022 (UTC)
6,951

edits