Converting unusual date formats

For discussions about programming, programming questions/advice, and projects that don't really have anything to do with Puppy.
Post Reply
Message
Author
User avatar
MochiMoppel
Posts: 2084
Joined: Wed 26 Jan 2011, 09:06
Location: Japan

Converting unusual date formats

#1 Post by MochiMoppel »

I tried to understand what kjdixo is doing in his Script to save Thunderbird emails using YAD and xdotool and I stumbled on his way to change the date format of Thunderbird emails.

Here is his basic code. A bit too complicated for my taste, but it gets the job done:

Code: Select all

#! /bin/bash
from="DonaldDuck"
var1="06/11/15 13:21" #  November 6th 2015 at 1:21 pm
var2=$(echo $var1 | sed 's/\//-/g' | sed 's/ /-/g' | awk '{print substr($0,7,2) substr($0,3,4) substr($0,1,2) substr($0,9,6)}') 
var3="20"$var2"-"
echo $var3$from        # 2015-11-06-13:21-DonaldDuck
Normally converting dates into any format is a piece of cake for the date command, so I thought that date could handle this job with a one liner. It turns out that date chokes on the dd/mm/yy format used by kjdixo's Thunderbird.

First attempt. November turns to June. Good idea, but wrong.

Code: Select all

date -d "$var1" +%Y-%m-%d-%R-"$from"  # 2015-06-11-13:21-DonaldDuck
Let's replace '/' with '-' and see what happens. Now we are in 2006

Code: Select all

var1=${var1//\//-}                    # 06-11-15 13:21
date -d "$var1" +%Y-%m-%d-%R-"$from"  # 2006-11-15-13:21-DonaldDuck
Obviously date, i.e. /bin/date, the coreutils version, has no clue what 06/11/15 means, and what's worse: There is no way to explain it to her. Only one last chance remains: busybox :lol: No joke, it works:

Code: Select all

busybox date -D '%d/%m/%y %H:%M' -d "$var1" +%Y-%m-%d-%R-"$from"  # 2015-11-06-13:21-DonaldDuck
The -D option tells busybox how to interpret the -d string. Very nice. When was the last time busybox won against the coreutils?

Edit: On my system busybox uses 10 times less CPU time than date, despite having to translate the -d string. date knows much more common formats than busybox and therefore allows shorter (and more portable) code, but still busybox is hard to beat in terms of flexibility.

.
Last edited by MochiMoppel on Fri 02 Sep 2016, 02:43, edited 1 time in total.

kjdixo
Posts: 153
Joined: Sun 13 Sep 2009, 21:13

#2 Post by kjdixo »

Thanks MochiMoppel
Pleased to see someone is looking at the code in my sometimes rambling threads.
My code is usually the first thing that works for me, so there will be other approaches.
I will experiment with your method in the near future.
Timestamping is used a lot everywhere.
This thread will be a very useful reference.

User avatar
MochiMoppel
Posts: 2084
Joined: Wed 26 Jan 2011, 09:06
Location: Japan

#3 Post by MochiMoppel »

kjdixo wrote:I will experiment with your method in the near future.
Timestamping is used a lot everywhere.
This thread will be a very useful reference.
And? Did you experiment?
Now, 2 months later, I see that in your new script version for Sylpheed you still use an awful lot of code to change a time stamp of Date: Fri, 4 Nov 2016 12:55:38 +0000 into 2016-11-04-12:55-

busybox date could do this with a one-liner. It would reduce 46 lines of your code to just 1 :wink:

kjdixo
Posts: 153
Joined: Sun 13 Sep 2009, 21:13

#4 Post by kjdixo »

Sorry not yet, I never got around to it, I will do it now.
I read your comments about busybox which at the time I noted for future reference.
I did consider it for a split second, but thought busybox might not be able to handle such a complicated string, I was wrong and should have gone and checked it out. More haste less speed as they say.
busybox date could do this with a one-liner. It would reduce 46 lines of your code to just 1
Date: Fri, 4 Nov 2016 12:55:38 +0000 into 2016-11-04-12:55
For reference my code was 27 lines less the comments.
In fact the multiple v1 and v4 entries are a safe replace method using pure bash.
There are other methods which would reduce the multiple lines of v1= and v4= to 2 lines.
Bringing the total down to 8 lines.
I tend not to worry about things like that if speed is not an issue.
But I must say I will be impressed if it can be done with a one liner.
Stack Overflow answers are not always the only or the best answers.

Code: Select all

v1=${v1/Jan/01}
v1=${v1/Feb/02}
v1=${v1/Mar/03}
v1=${v1/Apr/04}
v1=${v1/May/05}
v1=${v1/Jun/06}
v1=${v1/Jul/07}
v1=${v1/Aug/08}
v1=${v1/Sep/09}
v1=${v1/Oct/10}
v1=${v1/Nov/11}
v1=${v1/Dec/12}
var2=$(echo $v1 | sed 's/ /-/g')
var2=${var2:(-25)}
var3=${var2:0:16}
v4=$(echo $var3 | sed '/^-/s/./0/1')
v4=${v4/201/1}
v4=${v4/202/2}
v4=${v4/203/3}
v4=${v4/204/4}
v4=${v4/205/5}
v4=${v4/206/6}
v4=${v4/207/7}
v4=${v4/208/8}
v4=${v4/209/9}
var5=$(echo $v4 | sed 's/\//-/g' | awk '{print substr($0,7,2) substr($0,3,4) substr($0,1,2) substr($0,9,6)}')
v6="20"$var5"-"
Thanks MochiMoppel that would be great if I could do that I will try it, if I get stuck I will let you know.
I will do it before I post the new topic for Sylpheed.
Thanks very much.

kjdixo
Posts: 153
Joined: Sun 13 Sep 2009, 21:13

#5 Post by kjdixo »

Looks promising
This works

Code: Select all

# Fri, 4 Nov 2016 12:55:38 +0000

# 4/11/16 12:55:38 as experiment 1

v1=$(echo "4/11/16 12:55:38")

# with 'busybox date' don't need to worry about day being 4 or 04
# %d/%m/%y %H:%M tried adding %S but didn't work (did not need seconds anyway)

v6=$(busybox date -D '%d/%m/%y %H:%M' -d "$v1" +%Y-%m-%d-%R-"$from")

# 2016-11-04-12:55-

echo $v6 
I really like the fact you don't need to worry about a leading zero on the day (4 or 04) and the hyphens are inserted within the one liner.
The nice feeling you get when you can drop in and out of PHP so easily.
I like it.
However, I hope it can deal with changing the month Nov to 11.

User avatar
MochiMoppel
Posts: 2084
Joined: Wed 26 Jan 2011, 09:06
Location: Japan

#6 Post by MochiMoppel »

kjdixo wrote:I really like the fact you don't need to worry about a leading zero on the day (4 or 04) and the hyphens are inserted within the one liner
:lol: You are starting to grasp the beauty.

I tried to figure out what you are talking about and ran Sylpheed. I selected everything starting form "Date:" (see screenshot). That produced a multiline time stamp:
v1="Date: Sun, 6 Nov 2016 17:50:10 +0900
X-Mailer: Sylpheed 3.3.0 (GTK+ 2.24.10; i686-pc-linux-gnu)
".

You wiil see how easy it is to turn this monster into 2016-11-06-17:50-
Good luck!
Attachments
sylpheed_date_selection.png
(21.61 KiB) Downloaded 368 times

kjdixo
Posts: 153
Joined: Sun 13 Sep 2009, 21:13

#7 Post by kjdixo »

After a 3 hour break.
Took another look and got this to work using date info from PHP strftime()

Code: Select all

busybox date -D '%a, %d %b %Y %H:%M' -d "Fri, 4 Nov 2016 12:55" +%Y-%m-%d-%R-"$from"
Fri, 4 Nov 2016 12:55
2016-11-04-12:55-

Getting closer to the goal, now I need to try the multi line example that you highlighted.

kjdixo
Posts: 153
Joined: Sun 13 Sep 2009, 21:13

#8 Post by kjdixo »

Come to think of it I don't really need to go as far as a multi line input.
It will be a useful exercise for future reference though.

So this is now what I want and thanks for showing me this useful tool.
I already knew about busybox but had not thought of using it's various tools for everyday tasks.

Code: Select all

busybox date -D 'Date: %a, %d %b %Y %T +0000' -d "Date: Fri, 4 Nov 2016 12:55:38 +0000" +%Y-%m-%d-%R-"$from"
I shall incorporate this into my Sylpheed script and check it runs the same as before.
Then I will post it as a new topic, and give you credit in the text for this gem.
Thanks.

Extra note: I have left the time zone +0000 as a constant as most people will be able to factor this in and adjust it to their own timezone in the script.
However it would be nice if there was a system variable to drop in there, that would detect the timezone.

kjdixo
Posts: 153
Joined: Sun 13 Sep 2009, 21:13

#9 Post by kjdixo »

Splendid, it worked and I have updated the Thunderbird topic posts regarding Sylpheed.
Before I post it as a new topic for Sylpheed (instead of part of the Thunderbird thread) do you have any thoughts or ideas on automating the timezone (+0000), there must be a way to drop in a system variable that knows the timezone of the local machine?
Or maybe busybox can be made to ignore characters in specific positions as I don't need the timezone.
I could always truncate the string like I did originally.
I will try to find an easy and foolproof solution and hold fire on the new topic until you have replied.
Thanks.

User avatar
MochiMoppel
Posts: 2084
Joined: Wed 26 Jan 2011, 09:06
Location: Japan

#10 Post by MochiMoppel »

kjdixo wrote:do you have any thoughts or ideas on automating the timezone (+0000), there must be a way to drop in a system variable that knows the timezone of the local machine?
It's easier than you think.
All of the following 3 examples produce the same result. What you define in the -D FMT string is what makes the difference.

Here again your solution:
busybox date -D 'Date: %a, %d %b %Y %T +0000' -d "Date: Fri, 4 Nov 2016 12:55:38 +0000" +%Y-%m-%d-%R-

...and here is mine. Note that my time zone is different:
busybox date -D 'Date: %a, %d %b %Y %R' -d "Date: Fri, 4 Nov 2016 12:55:38 +0900" +%Y-%m-%d-%R-
which would be the same as:
busybox date -D 'Date: %a, %d %b %Y %R' -d "Date: Fri, 4 Nov 2016 12:55blablablabla" +%Y-%m-%d-%R-

The -D FMT "looks" into the time stamp only as far as it told to look, the red part in above exmples, and ignores the rest. Even if your time stamp continues into the next line it will be ignored. Pretty neat, isn't it?

kjdixo
Posts: 153
Joined: Sun 13 Sep 2009, 21:13

#11 Post by kjdixo »

That is extremely good news and you explain it very clearly.
This thread will definitely be a useful resource for others.
The internet is awash with clues on various technical tricks.
I sometimes suspect that the people who write these things are assuming that a tiny clue is sufficient and everyone will know how to proceed and be able to use a bit of detective work.
To be a good detective and problem solver you need to not give up too easily.
I did suspect that busybox date might ignore parts of the string.

My quote
Or maybe busybox can be made to ignore characters in specific positions as I don't need the timezone.
With me it can be a case of 'more haste less speed' because as soon as I have something that works, I go with it.
I don't like to spend too much time messing around when time is precious.
However it is, in this case, well worth the extra effort.
Your red highlighting of a multiline comment in a previous post was leading somewhere and I failed to take the bait.
Sorry about that and thanks again for the great learning curve.
I just wish all software tools and programs came with a comprehensive description and a large set of usage examples.
It would save a lot of detective work and experimentation.
Your solution is more than 'pretty neat' it's awesome.
It will now spring to mind whenever I see a date string.
Thanks for sharing it.

User avatar
MochiMoppel
Posts: 2084
Joined: Wed 26 Jan 2011, 09:06
Location: Japan

#12 Post by MochiMoppel »

Translate a month number into locale's full month name (e.g. translate '3' into 'March')

Translating a single number into the name of the month is normally done by first checking if the number is valid and then by matching each number with the corresponding month name, e.g. with a case statement. Needs quite a bit of code.

Busybox makes it very easy:

Code: Select all

read -p "Enter number of month (1 - 12), then press ENTER: " NUM
busybox date -D %m  -d "$NUM"   +%B
busybox date -D %m█  -d "$NUM█"  +%B
The first of above versions works as long as the user inputs a valid number, but since it looks for a string that starts with a valid number it returns wrong results when the user types more than just the number.

The second version is rock solid and accepts only valid numbers. The trick is to add a character to the -d string, a character that the user did not and could not input from the keyboard, e.g. a BEL character ($'\a') or fancy unicode. The -D option will look for a valid month number (%m), followed by this "stopper" character.

In following example only the first 2 inputs are valid and accepted by the 2nd busybox command.

Code: Select all

Enter number of month (1 - 12), then press ENTER: 3
March
March

Enter number of month (1 - 12), then press ENTER: 03
March
March

Enter number of month (1 - 12), then press ENTER: 30
March
date: invalid date '30█'

Enter number of month (1 - 12), then press ENTER: -3
date: invalid date '-3'
date: invalid date '-3█'

Enter number of month (1 - 12), then press ENTER: 123
December
date: invalid date '123█'

Enter number of month (1 - 12), then press ENTER: 111
November
date: invalid date '111█'
Of course it is also possible to translate a user input of 'March' (case insensitive!) into '3' by exchanging %m and %B, but I can't think of a case where this would be useful.

User avatar
technosaurus
Posts: 4853
Joined: Mon 19 May 2008, 01:24
Location: Blue Springs, MO
Contact:

#13 Post by technosaurus »

Since this is in the programming thread, I just thought I would point out that busybox's shells not only has substring manipulation capabilities similar to sed. So to replace all slashes and space with hyphens you can just do this bashism that supported by almost all shells except dash (busybox hush and ash both support it):

Code: Select all

string="/home/joe/some file with spaces"
echo ${string//[\/ ]/-}
and the read command uses a variable called $IFS for separators, so if you have something like "Date:09/21/77, garbage data", you can do something like:

Code: Select all

xfrm_date(){ #assumes data is read from stdin - adjust to suit
  local IFS=":/,"
  read DateLabel Month Day Year Garbage
  echo "$DateLabel: $Year-$Month-$Day"
}
If you want to write tight shell code, learn these two concepts and case statements. That will let you replace almost all external calls to sed, grep, cut, tr and many others ... while making the code an order of magnitude faster. Learning globbing and regular expressions is the next step.

Anytime I see sed grep awk and cut in the same shell script, I know it is a result of either cut+paste programming or an attempt to be POSIX shell compliant (no bashisms) ... which is the onlyr reason the old hotplug scripts were slow to begin with, If they had just updated POSIX to include substring manipulation, we never would have needed systemd or its predecessors.
Check out my [url=https://github.com/technosaurus]github repositories[/url]. I may eventually get around to updating my [url=http://bashismal.blogspot.com]blogspot[/url].

User avatar
MochiMoppel
Posts: 2084
Joined: Wed 26 Jan 2011, 09:06
Location: Japan

#14 Post by MochiMoppel »

technosaurus wrote:

Code: Select all

string="/home/joe/some file with spaces"
echo ${string//[\/ ]/-}
Good for short strings, but you have to know where to use it and where not, otherwise it may make your code an order of magnitude slower

Code: Select all

string=$(< /tmp/networkmodules)

for i in {1..10}; do
echo "${string// /_/}"
done

for i in {1..10}; do
echo "$string" | sed 's/ /_/g'
done
bash:
real 0m2.029s
user 0m1.883s
sys 0m0.023s

sed:
real 0m0.210s
user 0m0.053s
sys 0m0.063s

User avatar
technosaurus
Posts: 4853
Joined: Mon 19 May 2008, 01:24
Location: Blue Springs, MO
Contact:

#15 Post by technosaurus »

MochiMoppel wrote:
technosaurus wrote:

Code: Select all

string="/home/joe/some file with spaces"
echo ${string//[\/ ]/-}
Good for short strings, but you have to know where to use it and where not, otherwise it may make your code an order of magnitude slower

Code: Select all

string=$(< /tmp/networkmodules)

for i in {1..10}; do
echo "${string// /_/}"
done

for i in {1..10}; do
echo "$string" | sed 's/ /_/g'
done
bash:
real 0m2.029s
user 0m1.883s
sys 0m0.023s

sed:
real 0m0.210s
user 0m0.053s
sys 0m0.063s
Of course if you don't bother to check if sed is available before using it in order to provide a fallback AND you do an order of magnitude of useless looping AND use a large data set AND you don't specify LANG=C AND use bashisms that require bash instead of ash or hush, you will get those kinds of results. 99% of the data (including the data previously reference in this thread) that are processed by shell scripts are small enough (usually 1 line) that they will be faster by using shell builtins... if not, it could probably be rewritten in awk which has the functionality of sed, grep, tr, cut and many others builtin and optimized.

It is still possible to do this sort of stuff with a POSIX shell though by using $IFS and the fact that $@ expands into a numerical array of sorts that can be iterated over using shift.

Here is a non-production POSIX example to replace delimiter characters with another string.

Code: Select all

#!/bin/sh
replace_delims(){
	separator="$1"
	shift
	output="$1"
	shift
	while [ "$1" ]; do
		output="$output$separator$1"
		shift
	done
	echo "$output"
}

string="/home/joe/some file with spaces.jpg"
IFS=" /"
replace_delims "-" $string
Shells without substring manipulation can be a real PITA
Check out my [url=https://github.com/technosaurus]github repositories[/url]. I may eventually get around to updating my [url=http://bashismal.blogspot.com]blogspot[/url].

Post Reply