I'm just listing stuff I encounter in real life, this time about programming related to synchronous task execution.
What is synchronous task execution ?
It just means you run different tasks, but these tasks are related to each other technically.
One example, and maybe the best example ever because you'll see this kind of thing a lot:
You have a scheduled script, or even a normal script even, which stops software. That is as vague it can be, it can be ANY kind of software. The thing is: it is an action, it takes time, and it has a "finishing condition". The finishing condition just means : it may succeed, or it may fail. Or it may hang.
Now, what do all the clever admins do ? Well, you know the stop takes around 15 minutes to perform (that is, perform with success).
So, at any time, lets say 20h00 we perform this stop script.
Why do we stop the software ? Ah, that is because we want to do a backup of the software, and with most of the software you need a cold backup if you want a good backup.
So lets give it some time as a margin, lets say we only start our backup at 21h00, which means we allow 1 hour where we know only 15 minutes would be enough.
What is the problem now ? The backup task is obviously dependent from the stopping task. But, how do we make sure the backup task finds itself with software that is down ?
Well, we don't. At least, not in this case, we just HOPE it stops in time.
Let me restate that : we HOPE the software stops in time.
What do the very smart admins do (as opposed to earlier, these were only smart admins), we build in some checks.
These checks are to cover the fact that we are dealing with synchronous tasks, and we acknowledge and respect the fact that the operating condition of task number 2 in this case, depends on timely AND correct execution of task number 1.
You would think all of that is quite obvious, but yet I'm sure you'll find many of these things, if you look for it.
In this example, it's about stop/backup/start, which is a typical method that reoccurs a lot. What is the result, well, often the backup would be invalid since the backup was taken while (part of) the system was up. Most admins will glance over this and state it's not an issue. Until the moment a restore needs to occur ... and "there is a problem restoring". If the stop wasn't needed, why was it programmed to stop then ?
So how do you do it correctly ?
Well, the second step need to establish if the first step has completed correctly or not.
Two different status typically occur:
In the first case, step 1 fails, logically step 2 doesn't make sense and should not start. The end. For the next runs, you MUST check why the stop script fails, and fix that.
Second case, step 1 is still running. That is much more difficult. Ultimately 2 things are happening:
These case ARE technically different, and you should know which one is happening. The first is easy, and you just may need to give more time for the stop script. But it also may be an exceptional case. How do you solve this in a synchronous context ? Well, you CAN build lock files, but you should know that ALL processes following this will now be moved.
Suppose you do a stop at 20h, a backup at 21h and a start at 22h ... Well, the backup and start may now be at any time. Like 9h in the morning. And you don't want that. The backup will be valid, yes. So when you build lock files, you MUST use some time of time-out, and that is very complex. Many admins don't dare to make something like this, because it looks to complex. It actually also is, but what other better solution are you going to get then ?
Secondly the hanging issue. That's a serious problem on itself. Nothing should be hanging (on shutdown, ...), so you must fix this ASAP. Fix the stop script to include a fix for the hanging if you can't get the hang resolved.
Typically that is killing a lot of processes. If your architecture is build correctly it could mean to kill all processes of the user which is now running anyway (except the stop script itself).
Or, you can always hope the stop script stops in time, and hope you don't need to restore the backup, that is also an option.
What is synchronous task execution ?
It just means you run different tasks, but these tasks are related to each other technically.
One example, and maybe the best example ever because you'll see this kind of thing a lot:
You have a scheduled script, or even a normal script even, which stops software. That is as vague it can be, it can be ANY kind of software. The thing is: it is an action, it takes time, and it has a "finishing condition". The finishing condition just means : it may succeed, or it may fail. Or it may hang.
Now, what do all the clever admins do ? Well, you know the stop takes around 15 minutes to perform (that is, perform with success).
So, at any time, lets say 20h00 we perform this stop script.
Why do we stop the software ? Ah, that is because we want to do a backup of the software, and with most of the software you need a cold backup if you want a good backup.
So lets give it some time as a margin, lets say we only start our backup at 21h00, which means we allow 1 hour where we know only 15 minutes would be enough.
What is the problem now ? The backup task is obviously dependent from the stopping task. But, how do we make sure the backup task finds itself with software that is down ?
Well, we don't. At least, not in this case, we just HOPE it stops in time.
Let me restate that : we HOPE the software stops in time.
What do the very smart admins do (as opposed to earlier, these were only smart admins), we build in some checks.
These checks are to cover the fact that we are dealing with synchronous tasks, and we acknowledge and respect the fact that the operating condition of task number 2 in this case, depends on timely AND correct execution of task number 1.
You would think all of that is quite obvious, but yet I'm sure you'll find many of these things, if you look for it.
In this example, it's about stop/backup/start, which is a typical method that reoccurs a lot. What is the result, well, often the backup would be invalid since the backup was taken while (part of) the system was up. Most admins will glance over this and state it's not an issue. Until the moment a restore needs to occur ... and "there is a problem restoring". If the stop wasn't needed, why was it programmed to stop then ?
So how do you do it correctly ?
Well, the second step need to establish if the first step has completed correctly or not.
Two different status typically occur:
- step 1 failed
- step 1 still runs
In the first case, step 1 fails, logically step 2 doesn't make sense and should not start. The end. For the next runs, you MUST check why the stop script fails, and fix that.
Second case, step 1 is still running. That is much more difficult. Ultimately 2 things are happening:
- the stop is taking longer, much longer or much much longer
- the stop is hanging
These case ARE technically different, and you should know which one is happening. The first is easy, and you just may need to give more time for the stop script. But it also may be an exceptional case. How do you solve this in a synchronous context ? Well, you CAN build lock files, but you should know that ALL processes following this will now be moved.
Suppose you do a stop at 20h, a backup at 21h and a start at 22h ... Well, the backup and start may now be at any time. Like 9h in the morning. And you don't want that. The backup will be valid, yes. So when you build lock files, you MUST use some time of time-out, and that is very complex. Many admins don't dare to make something like this, because it looks to complex. It actually also is, but what other better solution are you going to get then ?
Secondly the hanging issue. That's a serious problem on itself. Nothing should be hanging (on shutdown, ...), so you must fix this ASAP. Fix the stop script to include a fix for the hanging if you can't get the hang resolved.
Typically that is killing a lot of processes. If your architecture is build correctly it could mean to kill all processes of the user which is now running anyway (except the stop script itself).
Or, you can always hope the stop script stops in time, and hope you don't need to restore the backup, that is also an option.
Last edited: