| Author |
Message |
big_bass

Joined: 13 Aug 2007 Posts: 1736
|
Posted: Sun 18 Dec 2011, 22:27 Post subject:
thread_saver Subject description: save forum threads |
|
thinking about the forum down time and its good to backup stuff before problems occur
*Mu and seaside had the idea for this and seaside used gtkdialog *
http://www.murga-linux.com/puppy/viewtopic.php?t=62236&search_id=1975402649
I wanted to do the same thing but I prefer Xdialog
and I wanted to test the three input option so I rewrote the GUI part in Xdialog
Major update
Updated and modified a lot to be easier for big threads
it makes a folder dates it , renames and renumbers the files
so that its easier on the browser to scroll quickly
12-29-2011
added long names and a filter to clean poorly formatted names
with the end goal from here files could be read and edited quickly
you can filter out the unneeded posts
| Code: |
#!/bin/sh
# thread_saver big_bass completely rewritten to the basics and using Xdialog
# 12-29-2011
# added date to folder rewritten the download part and the naming
# of the files and numbering
# original idea was based on
# ThreadGet Seaside 11-24-2010 (Based on Mu's Fetchforum basic program)
# For use on phpBB forums
# update 3-29-2011 --add startpage, append to existing files
#------------------------------------------------
SEL=`Xdialog \
--title "thread_saver" \
--separator "\n" --stdout \
--3inputsbox "thread_saver" 0 0 \
"URL (dowload this link)" "$1" \
"end page number" "$2" \
"name the html file" "$3"`
# lets get the three values in 3 separate arrays
SEL_ARRARY=($SEL)
THREAD=${SEL_ARRARY[0]}
NPAGE=${SEL_ARRARY[1]}
NAME=${SEL_ARRARY[@]:2:20}
# NAME get long names start at the third string
# and count 20 words as the max length
# rename badly formatted files that have spaces and symbols
NAME_FIXED=`echo "$NAME" | tr ';"<>,+!@#$?%^*&(){}[]' ' ' | tr -s ' ' '_*'`
#------------------------------------------------
add_date=`date "+%m"-"%d"-"%y"`
URL_NAME=`basename $THREAD`
mkdir -p /root/Forum_Threads/${NAME_FIXED}"-folder-"${add_date}
cd /root/Forum_Threads/${NAME_FIXED}"-folder-"${add_date}
# zero start count fix
NPAGE=($NPAGE-1)
let ALL_POSTS=15*$NPAGE
# simplified the renaming renumbering code a lot big_bass
for i in $(seq 0 15 $ALL_POSTS); do
let N_ADJ=($i/15)
wget -N "$THREAD&start=$i"
mv "$URL_NAME&start=$i" "${NAME_FIXED}""_""$N_ADJ"
echo "$URL_NAME&start=$i" "${NAME_FIXED}""_""$N_ADJ"
done
Xdialog --title "done" \
--msgbox "Thread downloaded to /root/Forum_Threads " 0 0 3000
|
 |
| Description |
|
| Filesize |
12.73 KB |
| Viewed |
700 Time(s) |

|
_________________ slackware 14
Last edited by big_bass on Thu 29 Dec 2011, 11:45; edited 12 times in total
|
|
Back to top
|
|
 |
puppyluvr

Joined: 06 Jan 2008 Posts: 3052 Location: Chickasha Oklahoma
|
Posted: Mon 19 Dec 2011, 00:55 Post subject:
|
|
Hello,
Thanks....
I was thinking along the same lines, but never got that far...
Sure beats 1 pg @ a time...
Works great, and all on 1 page...
Nice, and useful..
All right now everyone Backup your threads...
@Edit..
Works on puppylinux.info too!!
_________________ "Close the "Windows", and open your eyes, to a whole new world"
http://puppylinuxstuff.meownplanet.net/puppyluvr/
http://theplpd.webs.com/
Nothing but Puppy since 2.15CE...
|
|
Back to top
|
|
 |
jpeps
Joined: 31 May 2008 Posts: 2418
|
Posted: Mon 19 Dec 2011, 03:22 Post subject:
Re: thread_saver Subject description: save forum threads |
|
| big_bass wrote: |
| Code: |
# lets get the three values in 3 separate arrays
cat /tmp/downloader-info | tr '|' '\n' >/tmp/downloader-info2
select_array=(`cat /tmp/downloader-info2`)
echo ${select_array[0]}
echo ${select_array[1]}
echo ${select_array[2]}
THREAD=${select_array[0]}
NPAGE=${select_array[1]}
NAME=${select_array[2]}
#------------------------------------------------
|
|
another way:
| Code: |
var="$(cat /tmp/downloader-info)"
IFS="|"
set -- $var
THREAD="$1"
NPAGE="$2"
NAME="$3"
|
|
|
Back to top
|
|
 |
big_bass

Joined: 13 Aug 2007 Posts: 1736
|
Posted: Mon 19 Dec 2011, 11:23 Post subject:
|
|
I went the longer way to avoid using
because you have to unset the IFS
for any code that follows
because the pipe is frequently used command
I changed the --separator "|" already in Xdialog only
*and the URL has quite a few "/// " to filter
since the default in Xdialog uses those too
the main point expressed was how to use the three inputs in Xdialog
it reduced all that gtkdialog code to just a few lines
Joe
_________________ slackware 14
|
|
Back to top
|
|
 |
jpeps
Joined: 31 May 2008 Posts: 2418
|
Posted: Mon 19 Dec 2011, 12:13 Post subject:
|
|
| big_bass wrote: |
because you have to unset the IFS
for any code that follows
because the pipe is frequently used command
|
perhaps setting the --separator to " " ? Just an alternative...works fine the way you have it
| Code: |
var="$(cat /tmp/downloader-info)"
set -- $var
THREAD="$1"
NPAGE="$2"
NAME="$3"
|
|
|
Back to top
|
|
 |
big_bass

Joined: 13 Aug 2007 Posts: 1736
|
Posted: Mon 19 Dec 2011, 12:36 Post subject:
|
|
Major update
Updated and modified a lot to be easier for big threads
it makes a folder and renames and renumbers the files
so that its easier on the browser to scroll quickly
with the end goal from here files could be read and edited quickly
you can filter out the unneeded posts
Joe
jpeps
| Quote: |
perhaps setting the --separator to " " ? Just an alternative...works fine the way you have it |
I will look at that part again thanks still an alfa version *I want to combine some other html tools I wrote with this
| Code: | #!/bin/sh
# thread_saver big_bass re written the GUI to use Xdialog to test the three input option
# 12-19-2011
# ThreadGet Seaside 11-24-2010 (Based on Mu's Fetchforum basic program)
# For use on phpBB forums
# update 3-29-2011 --add startpage, append to existing files
#------------------------------------------------
Xdialog --separator "|" --3inputsbox "Big Thread downloader" 0 0 "URL (dowload this link)" "$1" "end page number" "$2" "name the html file" "$3" 2> /tmp/downloader-info
# lets get the three values in 3 separate arrays
cat /tmp/downloader-info | tr '|' '\n' >/tmp/downloader-info2
select_array=(`cat /tmp/downloader-info2`)
echo ${select_array[0]}
echo ${select_array[1]}
echo ${select_array[2]}
THREAD=${select_array[0]}
NPAGE=${select_array[1]}
NAME=${select_array[2]}
#------------------------------------------------
n=$((NPAGE*15-15))
mkdir -p /root/Forum_Threads/
mkdir -p /tmp/Forum_Threads
cd /tmp/Forum_Threads
if [[ $SP =~ ^[0-9]+$ ]]; then
SP=$((SP*15-15))
else
SP=0
fi
# simplified big_bass
for i in $(seq $SP 15 $n) ; do
wget -O $(printf "%04d" $i) -c "$THREAD&start=$i"
done
# file renumber and rename
cd /tmp/Forum_Threads
START_NUMBER=1
NUM=0
ls -1 | sort -n >/tmp/list_forum_pages.txt
for i in `cat /tmp/list_forum_pages.txt`
do
echo "renumber file --> $NUM"
mv /tmp/Forum_Threads/$i /tmp/Forum_Threads/$NAME"_"$START_NUMBER.htm
let NUM=$NUM+1
let START_NUMBER=$START_NUMBER+1
done
mkdir -p /root/Forum_Threads/$NAME
mv $NAME* /root/Forum_Threads/$NAME
rm -r /tmp/Forum_Threads/
rm -f /tmp/downloader-info
rm -f /tmp/downloader-info2
rm -f /tmp/list_forum_pages.txt
Xdialog --title "done" \
--msgbox "Thread downloaded to /root/Forum_Threads " 0 0 3000
|
_________________ slackware 14
|
|
Back to top
|
|
 |
aarf
Joined: 30 Aug 2007 Posts: 3620 Location: around the bend
|
Posted: Mon 19 Dec 2011, 15:31 Post subject:
|
|
i you can name the downloaded-page-thread by pulling the title from the <title>title</title>
then it can be automated to get many threads.
i have been waiting for years for a gui that produce the necessary code to do matching. my short term memory quickly forgets the stanyx and i have to start from scratch if i want to do this matching stuff. sorry.
_________________
ASUS EeePC Flare series 1025C 4x Intel Atom N2800 @ 1.86GHz RAM 2063MB 800x600p ATA 320G
_-¤-_
<º))))><.¸¸.•´¯`•.#.•´¯`•.¸¸. ><((((º>
|
|
Back to top
|
|
 |
big_bass

Joined: 13 Aug 2007 Posts: 1736
|
Posted: Mon 19 Dec 2011, 15:49 Post subject:
|
|
aarf
its doable but ... one example here you see all the spaces in the names it can be done it just needs some
adjustments some conditioning to be a "correct" file name ... hey no problem that's easy to do
<title>Puppy Linux Discussion Forum :: View topic - Classic Pup 2.14X -- Updated 2 series</title>
Joe
_________________ slackware 14
|
|
Back to top
|
|
 |
aarf
Joined: 30 Aug 2007 Posts: 3620 Location: around the bend
|
Posted: Mon 19 Dec 2011, 16:05 Post subject:
|
|
| big_bass wrote: | aarf
its doable but ... one example here you see all the spaces in the names it can be done it just needs some
adjustments some conditioning to be a "correct" file name ... hey no problem that's easy to do
<title>Puppy Linux Discussion Forum :: View topic - Classic Pup 2.14X -- Updated 2 series</title>
Joe | ok i think that as well as the relevant title bits, a date from the first post could also feature in the name that will eliminate duplicate names and make it easier to reference. go further and also add the date of the last post and it will be easier to do new backups. something like
1.nov.2010 classic pup 2.14X -- Updated 2 series 20.nov.2011.htm. or whatever date format is easy to search or order in time. possibly also include the original thread number in the name also for future backup reference.
(i'll pull my request for ,mht image containing files for now, till it progresses further. )
_________________
ASUS EeePC Flare series 1025C 4x Intel Atom N2800 @ 1.86GHz RAM 2063MB 800x600p ATA 320G
_-¤-_
<º))))><.¸¸.•´¯`•.#.•´¯`•.¸¸. ><((((º>
|
|
Back to top
|
|
 |
seaside
Joined: 11 Apr 2007 Posts: 832
|
Posted: Mon 19 Dec 2011, 16:15 Post subject:
|
|
big_bass,
Nice work. (It just shows what can happen when someone who knows what they're doing gets a hold on things)
Also the file i/o could be eliminated- | Code: | SEL=`Xdialog --separator "|" --stdout --3inputsbox "Thread downloader" 0 0 "URL (dowload this link)" "$1" "end page number" "$2" "name the html file" "$3"`
THREAD=`echo "$SEL" | cut -f1 -d'|'`
NPAGE=`echo "$SEL" | cut -f2 -d'|'`
NAME=`echo "$SEL" | cut -f3 -d'|'`
|
For the life of me, I can't remember why I thought wget had to be run in a terminal for this to work
Regards,
s
|
|
Back to top
|
|
 |
aarf
Joined: 30 Aug 2007 Posts: 3620 Location: around the bend
|
Posted: Mon 19 Dec 2011, 16:19 Post subject:
|
|
probably would be a good idea to pop over to phpbb devs forum and check to see we're not re-inventing the wheel.
_________________
ASUS EeePC Flare series 1025C 4x Intel Atom N2800 @ 1.86GHz RAM 2063MB 800x600p ATA 320G
_-¤-_
<º))))><.¸¸.•´¯`•.#.•´¯`•.¸¸. ><((((º>
|
|
Back to top
|
|
 |
big_bass

Joined: 13 Aug 2007 Posts: 1736
|
Posted: Mon 19 Dec 2011, 23:39 Post subject:
|
|
Hey seaside
you did a great job
and I also took your suggestion about the
code snippet you posted today and used it thanks
I got a little 'array happy' in that part
*its a habit to for me to use extra output files when testing
so I can debug stuff quickly Xdialog either works or it doesnt
no good error messages but its mostly easy
@hey jpeps you had a good code snippet too and worked
but I went with seasides
@aarf I have to do some heavy testing with the auto naming part before I add it
some people even included slashes ,back slashes , spaces and other symbols in the file names that doesnt play nicely with making directories
updated main post with seasides suggested
shortened code snippet
Joe
_________________ slackware 14
|
|
Back to top
|
|
 |
aarf
Joined: 30 Aug 2007 Posts: 3620 Location: around the bend
|
Posted: Tue 20 Dec 2011, 02:27 Post subject:
|
|
| big_bass wrote: |
@aarf I have to do some heavy testing with the auto naming part before I add it
some people even included slashes ,back slashes , spaces and other symbols in the file names that doesnt play nicely with making directories
Joe |
there has got to be a ready made code snippet that does the job, just a matter of knowing where to find it
perhaps autoname them to their thread number for now. still useful and unique and google search can still be used to find content.
_________________
ASUS EeePC Flare series 1025C 4x Intel Atom N2800 @ 1.86GHz RAM 2063MB 800x600p ATA 320G
_-¤-_
<º))))><.¸¸.•´¯`•.#.•´¯`•.¸¸. ><((((º>
|
|
Back to top
|
|
 |
jpeps
Joined: 31 May 2008 Posts: 2418
|
Posted: Tue 20 Dec 2011, 04:14 Post subject:
|
|
| big_bass wrote: |
@hey jpeps you had a good code snippet too and worked
but I went with seasides |
I agree....so
| Code: |
SEL=`Xdialog --separator " " --stdout --3inputsbox "Thread downloader" 0 0 "URL (download this link)" "$1" "end page number" "$2" "name the html file" "$3"`
set -- $SEL
THREAD="$1"
NPAGE="$2"
NAME="$3"
|
|
|
Back to top
|
|
 |
aarf
Joined: 30 Aug 2007 Posts: 3620 Location: around the bend
|
Posted: Tue 20 Dec 2011, 09:28 Post subject:
|
|
| aarf wrote: | | big_bass wrote: |
@aarf I have to do some heavy testing with the auto naming part before I add it
some people even included slashes ,back slashes , spaces and other symbols in the file names that doesnt play nicely with making directories
Joe |
there has got to be a ready made code snippet that does the job, just a matter of knowing where to find it
perhaps autoname them to their thread number for now. still useful and unique and google search can still be used to find content. |
naming to their thread number wont need any matching at all. it will be simple to just replace their name in the code by the number of the step variable. should be ready to start already. (but am not in thinking mode at present. ) will need a test for empty downloands so page number wouldnt be needed, or will need to match and thus get the number of pages number from the first page.
_________________
ASUS EeePC Flare series 1025C 4x Intel Atom N2800 @ 1.86GHz RAM 2063MB 800x600p ATA 320G
_-¤-_
<º))))><.¸¸.•´¯`•.#.•´¯`•.¸¸. ><((((º>
|
|
Back to top
|
|
 |
|