Print bookmarks.html with the formatting



G'day @fixit7 :)

You seem to have a penchant for placing Threads in Ubuntu, whereas some of the subject matter is more General. :D

I will move this to Command Line, because I am guessing a certain Member or two will spot it, and suggest a script.

Cheers, and I will watch with interest.

Wizard
 
G'day @fixit7 :)

You seem to have a penchant for placing Threads in Ubuntu, whereas some of the subject matter is more General. :D

I will move this to Command Line, because I am guessing a certain Member or two will spot it, and suggest a script.

Cheers, and I will watch with interest.

Wizard
Sorry about that. I forgot that some topics are not specific to ubuntu.
 
Is there a way to print my Firefox bookmarks without all the html formatting.

For example, I would want just "http://resources.hewitt.com/centerpoint/" to be printed.

Try this:
Code:
lynx -dump http://resources.hewitt.com/centerpoint/

Though that page has moved. This will be more interesting:
Code:
lynx -dump "https://aura.alight.com/proxypu/servlet/02017_auth?linkId=FRAUD"

BTW, if you want to use the clipboard, install xclip. Then you can right click and "copy link address" and then use this command:
Code:
lynx -dump "$(xclip -sel clip -o)"
 
And if you don't have lynx and xclip, you can use the standard GNU Unix toolset.

After a tiny bit of trial and error - This one-liner worked in cygwin - to extract the links from the bookmarks.html I exported from Firefox on my Windows PC at work.
Code:
\grep HREF= ~/bookmarks.html | awk '{print $2}' > ~/bookmarks.txt; sed -i -- '/place:/d; s/HREF=//g; s/"//g' ~/bookmarks.txt
Assuming that the Linux version exports the bookmarks in the same format, the above one-liner should work in Linux too.

The grep command searches for lines in ~/bookmarks.html that contain the string "HREF=".

Matching lines are piped to awk, where we print the 2nd field, which should contain the HREF= property containing a website URL. Awk outputs that to a new file called ~/bookmarks.txt.

Then we use sed on ~/bookmarks.txt to filter out the HREF= tags and the double quotes that enclose the URLs/links for our bookmarks.

We're also ignoring lines that contain "place:".
URLS with "place:" in them are used internally by firefox and contain metadata about any folders/subfolders you have in firefox's bookmark manager. So we want to exclude those URLS from our final output too.

The -i option to sed means that sed will edit the input file in place. So any changes are made directly to ~/bookmarks.txt.
So in the end, we should just end up with a text file with a bunch of website URLs.

Job done! Hopefully?!

And before anybody says anything - yes, I do have to use Windows at work - but I don't get any choice about that. But I try to use as much free-software as possible. Sometimes Cygwin is the only thing that keeps me sane!

But at home, I'm 100% Linux and free-software! XD
 
And if you don't have lynx and xclip, you can use the standard GNU Unix toolset. ...

Code:
\grep HREF= ~/bookmarks.html | awk '{print $2}' > ~/bookmarks.txt; sed -i -- '/place:/d; s/HREF=//g; s/"//g' ~/bookmarks.txt

I think this does the same, but only uses awk in one pass.
Code:
awk '/place:/{next}; /HREF=/{gsub(/HREF=|"/,"",$2); print $2}' bookmarks.html > bookmarks.txt

The problem is, both depends on a very particular HTML format. If you're only using it on bookmark pages--great. But if you then decide to try it on some website page you found--not so great.

BTW, there's another package you can install to do this, python-html2text or python3-html2text. But I found it to be temperamental.
 
Thanks to all for your help.

Found this too.

lynx --dump ./bookmarks.html > file.txt
 
Last edited:
I will move this to Command Line, because I am guessing a certain Member or two will spot it, and suggest a script.

Wizard lucks in :D

Ken, meet Jas. Jas, meet Ken.

Cheers

Wizard
 
I think this does the same, but only uses awk in one pass.
Code:
awk '/place:/{next}; /HREF=/{gsub(/HREF=|"/,"",$2); print $2}' bookmarks.html > bookmarks.txt

The problem is, both depends on a very particular HTML format. If you're only using it on bookmark pages--great. But if you then decide to try it on some website page you found--not so great.

BTW, there's another package you can install to do this, python-html2text or python3-html2text. But I found it to be temperamental.

Thanks Ken.
Yes, I agree your awk one-liner is a much more elegant solution than my horrible hack!

In my defense, I was writing my post at the end of the work day yesterday, whilst one of my backup scripts was running. I had to shut my machine down and run out to catch my bus home as soon as my backups were finished. So I was in a bit of a rush and just posted what I had. I did mean to write a little more, but ran out of time.

I also meant to revisit this thread when I got home yesterday evening to explain that my solution was a bit of a hack and to try to come up with a more elegant one liner using only sed or awk. But I fell asleep last night shortly after eating, so it didn't happen!

With sed and awks built-in pattern matching capabilities, there was no need for me to use grep. And I also know that generally speaking, if you find yourself using sed and awk together, it usually means that you really only need awk. So I did break a few of the golden rules of scripting there. But again - it was a quick and dirty hack, off the top of my head - composed after a quick look at the formatting of the links in Firefox's bookmarks.html.

So it was very specific to firefox's bookmarks.html and a bit of a hideous hack. But it did the job! XD
Thanks again for the awk one liner... Much better than my initial effort!
 

Members online


Latest posts

Top