common mistakes with synchronous task execution

Diputs

Active Member
Joined
Jul 28, 2021
Messages
341
Reaction score
142
Credits
2,669
I'm just listing stuff I encounter in real life, this time about programming related to synchronous task execution.

What is synchronous task execution ?

It just means you run different tasks, but these tasks are related to each other technically.
One example, and maybe the best example ever because you'll see this kind of thing a lot:

You have a scheduled script, or even a normal script even, which stops software. That is as vague it can be, it can be ANY kind of software. The thing is: it is an action, it takes time, and it has a "finishing condition". The finishing condition just means : it may succeed, or it may fail. Or it may hang.

Now, what do all the clever admins do ? Well, you know the stop takes around 15 minutes to perform (that is, perform with success).
So, at any time, lets say 20h00 we perform this stop script.

Why do we stop the software ? Ah, that is because we want to do a backup of the software, and with most of the software you need a cold backup if you want a good backup.
So lets give it some time as a margin, lets say we only start our backup at 21h00, which means we allow 1 hour where we know only 15 minutes would be enough.

What is the problem now ? The backup task is obviously dependent from the stopping task. But, how do we make sure the backup task finds itself with software that is down ?
Well, we don't. At least, not in this case, we just HOPE it stops in time.

Let me restate that : we HOPE the software stops in time.

What do the very smart admins do (as opposed to earlier, these were only smart admins), we build in some checks.

These checks are to cover the fact that we are dealing with synchronous tasks, and we acknowledge and respect the fact that the operating condition of task number 2 in this case, depends on timely AND correct execution of task number 1.

You would think all of that is quite obvious, but yet I'm sure you'll find many of these things, if you look for it.

In this example, it's about stop/backup/start, which is a typical method that reoccurs a lot. What is the result, well, often the backup would be invalid since the backup was taken while (part of) the system was up. Most admins will glance over this and state it's not an issue. Until the moment a restore needs to occur ... and "there is a problem restoring". If the stop wasn't needed, why was it programmed to stop then ?

So how do you do it correctly ?
Well, the second step need to establish if the first step has completed correctly or not.

Two different status typically occur:
  • step 1 failed
  • step 1 still runs

In the first case, step 1 fails, logically step 2 doesn't make sense and should not start. The end. For the next runs, you MUST check why the stop script fails, and fix that.

Second case, step 1 is still running. That is much more difficult. Ultimately 2 things are happening:
  • the stop is taking longer, much longer or much much longer
  • the stop is hanging

These case ARE technically different, and you should know which one is happening. The first is easy, and you just may need to give more time for the stop script. But it also may be an exceptional case. How do you solve this in a synchronous context ? Well, you CAN build lock files, but you should know that ALL processes following this will now be moved.
Suppose you do a stop at 20h, a backup at 21h and a start at 22h ... Well, the backup and start may now be at any time. Like 9h in the morning. And you don't want that. The backup will be valid, yes. So when you build lock files, you MUST use some time of time-out, and that is very complex. Many admins don't dare to make something like this, because it looks to complex. It actually also is, but what other better solution are you going to get then ?

Secondly the hanging issue. That's a serious problem on itself. Nothing should be hanging (on shutdown, ...), so you must fix this ASAP. Fix the stop script to include a fix for the hanging if you can't get the hang resolved.
Typically that is killing a lot of processes. If your architecture is build correctly it could mean to kill all processes of the user which is now running anyway (except the stop script itself).

Or, you can always hope the stop script stops in time, and hope you don't need to restore the backup, that is also an option.
 
Last edited:


I created a 00start bash script for the daily cron jobs along with a zzstop file. These two files are in the /etc/cron.daily directory. The first creates a lock file called /tmp/cron.daily.lock and the second one removes it when the daily cron jobs are finished running. As a result I was able to create another bash script called waitdaily that looks like this:

Code:
#!/bin/bash
#
# Waits for the daily cron jobs to be completed.

while [ -f /tmp/cron.daily.lock ]; do
     sleep 1m
done

exit 0

# EOF

I also use lock files for the weekly cron jobs and some AV scans. This way I can schedule other jobs by stacking them up on the command line with ; characters. I would wait until the system wasn't busy that way first before starting something like video processing or whatever else. My AV scans will look for /tmp/cron.daily.lock before starting clamscan to make sure aide and such cron jobs are done running first.

Signed,

Matthew Campbell
 
What if the first job hangs, or run much, much longer ? So, meaning that the lock file remains
 
What if the first job hangs, or run much, much longer ? So, meaning that the lock file remains
Yes, the lock file would remain if one of the daily jobs gets stuck, which has never happened. If your jobs are getting stuck then you may have bad sectors on your hard drive or some other hardware issue. That's why I check to see how everything is going from time to time, though generally not during the daily cron jobs themselves. I have top running to keep an eye on things and root is always logged in somewhere. A good system administrator is diligent in their duties. You can also use journalctl -xe to check on things.

Signed,

Matthew Campbell
 

Members online


Top