Better text editor or viewer than Glogg or Cat?

None-yet · Nov 9, 2020

Looking for a better text editor or viewer. I have been trying to use Glogg on both my Windows and my Kali. It crashes on each more than 50% of the time. I have resorted to going back to just cat files but this seems so slow. I often am checking server logs and other files that are as large as 200gb on the low end and much larger such as the one today at 526gb. I need to be able to open them as fast as I can which I understand is just so fast that this happens. But like with Glogg I start a file opening and come back to check it 30 min later and it has crashed so I have to do the whole thing again. Spliting the files is something I have stopped doing because putting them back together often leaves a space somewhere which is not wanted.

Any suggestions?

KGIII · Nov 9, 2020

I have good luck with Gedit and Featherpad.

Back in my Windows days, I was in love with Crimson Editor. Man, I'd love that ported to Linux.

In the terminal, I have yet to find anything that I prefer more than nano. Nano has some great features and has a tiny footprint.

None-yet · Nov 9, 2020

I have no reason other than a possible personality clash with Nano. It is a good option. Nano and I are like the two guys that for some reason just don't like each other. My try it however. Thanks for the suggestions.

khedger · Nov 10, 2020

It's not clear exactly what you're doing with these log files, so there may be more effective ways of accomplishing your tasks than just opening the files, but have you tried the 'more' command? I don't know if it has a filesize limitation or not. I also use vi or vim for editing files.
If you are searching for something in particular in the logs you could try using 'grep' to search for keywords or strings.....

keith

stan · Nov 10, 2020

Even with 96GB of RAM, it doesn't seem surprising that you're choking on a 526GB text file. It might help to increase swap (Linux) or pagefile (Windows), but it sure seems that you are pushing the boundaries of memory for a standard home or office PC, even a high-end one such as yours.

This editor may work (Windows only, free and pro versions). This article lists a couple of text file viewers, up to 30GB is mentioned, so they may not be powerful enough for you.

Condobloke · Nov 10, 2020

How sure are you that the Text file is in GB..?

How many pages is a GB of text?
677,963 pages

Text files, on average, consist of a whopping 677,963 pages per gigabyte

Therefore, your text file consists of approximately 356,575,400 Pages........?

khedger · Nov 10, 2020

Another approach I might employ would be to adjust the app that's logging the files to create new files more often, so that you end up with say, 24 hourly logs instead of one big daily log....

keith

None-yet · Nov 10, 2020

Great input and I thank you. The log is pulled an average every 10 hours. One company I host for requires certain parameters before I upload their files to them. They want all of their sites on one log-file with one date for easy filing on their end. CEO enforces this which I assume is his old fashioned filing system when you pull a file from your file cabinet you get one file per one day and not a part B,C and D. Makes since to him. I agree with you here and have spoken to them and was told to bring in a new price for the extra trouble but that's the way it will be done. So I did. He has 14 major sites from one million avg visits up to 20 million per day. A couple of the sites get hit hard hourly attack wise.

Current size for file born yesterday till now is SIZE ON DISK 399 GB (429,468,848,128 bytes) and last memorial day was SIZE ON DISK 648 GB (679,468,848,128 bytes) and we have them upwards of 800 on occasions. Since we never break them down in pages that is something I have never looked into.

Vim or Gvim take just too long. I do have to open these all the way because I am required to highlight security hits that concern me. I am still learning sed, awk, gawk and so on and yes some things this would make easier. One thing their software does that erks me is it cannot log some numbers correctly and they are supposed to be supplanting a fix to me that would fix this but been waiting since June. One part of this issue is that if it logs a time stamp as such --> 0514-9172020-052-24855345 and to break this down on that number it is (time-date-seconds-place holder for even). This is their own software. It provides no link so I am required to manually scroll to this point of "0514-9172020-052-24855345" look at the even provide explanation. I may find for instance that the log portion files the event at "0514-9172020-052-24855345" however when I track it I find that it took place at "0514-9272020-052-24855345" or that the place place holder is wrong such as ""0514-9172020-052-24855500" so I must reconcile these and manually make the changes. I have been trying to fiquare out on my own how to use sed or awk to be able to make these changes without ever having to sit here and open the files. As much as I love to work things out on my own I am close to asking for help here as to how this can be done. Might as well do so now.

The logged time stamp is always on a line of it's own. Such as the below.

0514-9172020-052-24855345

Event xxx xx xxxxxxx x xxx xxxxx x xxxx. - Then at the report summery I double check and it shows the Event Marker of.

0514-9172020-052-24855500
Event xxx xx xxxxxxx x xxx xxxxx x xxxx

Meaning I need to decide which one is correct which I can do by looking elsewhere on my log (that they will not accept)

Then I must make changes to "0514-9172020-052-24855500" using a stream editor would make this task easier by calling a change to line such and such or even making it do a search and locate the correct on and make the incorrect one reflect the correct info all without opening it. I would still need to open it for other reasons but that would be one out of the way. Break it down.

Like with awk or something. have it search for 0514-9172020-052-24855345 and change it to
0514-9172020-052-24855500 - something else this software does is like in August where it was supposed to display an 8 it stayed on displaying a 7 till it reach 8-19 then it went from writing correctly 8-19 instead of 7-19.

I have nothing to do with their software but only told to make the corrections and they are working on it. Last conversations on this issue was "well, if you are spending 9 hours a week on your end then just charge for it". They want it correct no matter how much it messes with me and what else I may have going on that day.

If someone would like to help me with this part then many thanks. As for the above suggestions I still am going to try some of them today. I have a lot of meetings today but plan on checking back.

Thanks for your time in reading this.

khedger · Nov 10, 2020

I feel your pain. When investigating these other commands like SED and AWK make sure to check out grep thoroughly. If you get to the point that you know what you're looking for in a file, grep can help you find the records that contain it, AND get at the records before and after the one where your data is found....it's VERY powerful and handy!

keith

None-yet · Nov 10, 2020

Thanks for the encouragement. I will get it. This is my busy time of the year so reading time in at a premium.

JasKinasis · Nov 10, 2020

Hmmm...... Vim doesn’t handle huge files very well at all. Vim is my editor of choice.

Does Emacs work with huge files??
I know how to use emacs quite well, but I’ve never tried using it with insanely huge files, so I don’t know offhand......

With large files, I usually split and join them.
You can use sed or awk to remove blank/empty lines before joining them.

For viewing large files ( in order to identify patterns that can be used with sed, or awk) I’ll use either w3m, or the less command.

For searching through large files quickly, I’d recommend ag (Aka the silver searcher) - it’s a multi-threaded grep on steroids. It typically gets results much more quickly than grep. So it can be a time saver.

For editing large files, sed is my usual tool of choice.
For generating reports on a large file, I’ll use awk if there are records that can be extracted and parsed and sent to a file. Otherwise, I’ll just use ag, or grep to extract the most relevant parts of the file and redirect that to another file.

But that approach involves using a lot of arcane regex syntax.... It can be a pain sometimes. I often find myself having to look up how to do certain things when using regexes!

I have also heard that cuda text editor works well with large files. But I’ve never tried it because it’s a GUI only editor. I prefer to keep things like this in the terminal.
From what I recall it’s a pretty powerful, but lightweight editor. Part of the free pascal project I think:

Untitled

None-yet · Nov 10, 2020

With Vim, it's just slow opening. Glogg was opening them in a fraction of the time Vim would. But with Glogg it either crashed or froze. For standard stuff, yes it is great.

With the file, They insist on one whole file. Split it once and got a call from their IT guy. They have to have it how they want it.

KGIII · Nov 10, 2020

Can you split it to work on it and then put it back together before sending it on?

That might be easy to automate?

rado84 · Nov 10, 2020

For the 4 years and a half of using Linux I tried all the text editors I could find back then and only GEdit met all of my expectations. Why? Because it reads properly all kinds of encoding without the need for me to do anything. It doesn't even have a setting to choose encoding but it doesn't need it anyway.

JasKinasis · Nov 10, 2020

None-yet said:
With Vim, it's just slow opening. Glogg was opening them in a fraction of the time Vim would. But with Glogg it either crashed or froze. For standard stuff, yes it is great.

I know all about it - it's been a long-standing issue with vim.
I wonder if the neovim/nvim developers have done anything to solve the issue of opening large files?!
I'd forgotten about nvim.... I haven't tried it since it was in its early days - I might try it later.

None-yet said:
With the file, They insist on one whole file. Split it once and got a call from their IT guy. They have to have it how they want it.

Ah right! So you can't get away with splitting the file whilst you're processing it and then rejoining it afterwards then?! That's a pain.
Without splitting the file - it really only leaves using standard tools like less, grep/ag, sed, awk etc etc.

rado84 said:
For the 4 years and a half of using Linux I tried all the text editors I could find back then and only GEdit met all of my expectations. Why? Because it reads properly all kinds of encoding without the need for me to do anything. It doesn't even have a setting to choose encoding but it doesn't need it anyway.

Unfortunately, Gedit isn't really any better when trying to open insanely large files though, so won't be of any use to the OP.

JasKinasis · Nov 10, 2020

@None-yet - I'm just re-reading through the changes you need to make to the files and trying to fully understand what you're trying to do here. I'll see if I can come up with something using sed or awk.

f33dm3bits · Nov 10, 2020

I would have a look at logstash if I were you, that way you have a lot more options since it supports transforming and parsing the log format so that you can then transport it afterwards.

JasKinasis · Nov 10, 2020

OK - in order to identify lines where there are timestamps in the log-file you can use the following egrep command:

Bash:

egrep -n "[0-9]{4}-[0-9]{7,8}-[0-9]{3}-[0-9]{8}" /path/to/file

That will show you every line in the file containing a time-stamp.

That's working on the assumption that we're looking at:
4 digits for the time, a dash -, 7 or 8 digits for the date (there will be 8 digits in dates from oct to dec), a dash, 3 digits, another dash, and then another 8 digits.

If you're going to use that egrep command a lot - you could add a temporary alias to your terminal session by entering the following command:

Bash:

alias findts='egrep -n "[0-9]{4}-[0-9]{7,8}-[0-9]{3}-[0-9]{8}"'

NOTE: Alternatively, if you add the above line to your .bashrc - it will also be available in any subsequent terminal sessions you open.

And then you would use it like this:

Bash:

findts /path/to/file

I called the alias findts - as in "find time-stamp". But you can call the alias whatever you like - just don't go calling it firefox, or something else that is already in use! Ha ha!

Obviously, if you need to redirect the output from the above egrep command (or it's findts alias) to a separate file, so you have a file that indexes the time-stamps and their positions in the original log-file - you can do that quite simply too.
Just add > /path/to/ts_index-file[/code] after [icode]/path/to/file.

Then you can grep through your index file to find the line numbers of a specific time-stamp, if you do have any cross-referencing to do for this.

I'm a bit fuzzy on some of the details of what you're doing. It was a lot to take in.
Do you have your own log of events that you're cross referencing? Or have they given you some kind of list of events?

Anyway - once you have identified where the timestamps are in the log-file, you might want to look at a range of lines in the file to view some of the other details in that particular log entry.

A while ago, I created a trivial script that uses sed to display a range of lines in a file.
So this might come in handy:
I have this saved in my personal bin directory (~/bin/) as range:

Bash:

#!/usr/bin/env bash
if [[ $# -ne 3 ]]; then
    echo "USAGE: $0 {line-from} {line-to} {filename}"
    exit 1
fi
if [[ ! -f $3 ]]; then
    echo "File does not exist!"
    exit 1
fi
# use sed to print a range of lines from a file
sed -n "$1,$2p" $3

Put that somewhere in your $PATH (again, I use my personal bin directory, so it's only available to me - not other users on my PC).
And run it like:

Bash:

range 5 10 /path/to/file

And it will show lines 5 to 10 of /path/to/file.

Notes:
Because range is a quick and dirty little tool I wrote for myself - I've done minimal parsing of parameters.
No validation is performed on the line-from and line-to parameters.
It's not remotely bullet-proof - I've just done the bare minimum to make it useful to me.
If I run it with no parameters, or the wrong number of parameters - the script gives me some usage information to jog my memory.
And if the file specified in parameter 3 to the script does not exist, it bombs out with an error!

So that might come in handy for you. I typically only use it for viewing snippets of normal sized files. I don't deal with immense files too often.
When looking through a large file - In order to view 10 lines from e.g. line 5999997 to line 6000007, you'd have to run the range script like this:

Bash:

range 5999997 6000007 /path/to/file

Which is a lot of typing and a bit of a PITA when dealing with large line-numbers. So, it could be improved.
For example -
I have been thinking for a while that it would be nicer (and less typing) to be able to allow the second parameter to be something like +10.
e.g.

Code:

range 5999997 +10 /path/to/file

So if the second parameter starts with a +, it will be interpreted as a relative number of lines and will display that many lines from the start line.
If the second parameter is just a number it will be interpreted as a literal line number.
But unfortunately - I haven't got around to making that change yet! Sorry!

Anyway - those are a couple of initial tool ideas that might help you to speed up the process of tracking down/identifying some of the entries you're looking for in these log-files.

WRT the edits:
If you have identified the time-stamp and you know what it should be:
e.g. From your earlier post

0514-9172020-052-24855345 and change it to
0514-9172020-052-24855500

Assuming that the time-stamps are unique - and that there aren't several entries with the same time-stamp - that should be a simple sed one-liner:

Bash:

sed -i 's/0514-9172020-052-24855345/0514-9172020-052-24855500/' /path/to/file

If you have identified several timestamps that need changing - it won't be pretty, but you can do multiple edits with a single pass of sed:
e.g.

Bash:

sed -i 's/0514-9172020-052-24855345/0514-9172020-052-24855500/;s/aaaa-aaaaaaa-aaa-aaaaaaaa/bbbb-bbbbbbb-bbb-bbbbbbbb/;s/i_think_you_get_the_idea/etc_etc_etc/;' /path/to/file

Something like that will reduce the number of times you run sed to edit the file. So the file will be overwritten less times. But you're looking at a lot of potentially error prone typing, and/or copy-pasta!

But if the time-stamps are NOT unique to each entry - then it becomes even more complicated, because we need to look at the information in each entry and then determine from context which one/ones we need to edit.

It's baby-steps at the moment. And it might take some time. But we should hopefully be able to knock up some scripts that can at least take some of the pain out of perfoming this task. And if we're lucky, we might be able to semi-automate some of this process, if not completely automate it.

This does seem like a very labour intensive manual task. And they haven't exactly made this easy for you with their crappy log-format, or their buggy software writing out incorrect information.

f33dm3bits · Nov 10, 2020

Off-topic: @JasKinasis Your replies are always so detailed, you should write a book about a topic you are passionate about or teach a class in your area of expertise

If you do this for free you must write book works for your boss and I know for sure your boss doesn't pay you enough!

KGIII · Nov 10, 2020

f33dm3bits said:
Your replies are always so detailed, ...

As I was reading their reply, I was writing a similar response in my head. I didn't get very far because I noticed your response.

It's always a treat to read their responses. I've made it a habit to look for 'em. I especially appreciate the bash/scripting responses. There's always so much more for me to learn.

Better text editor or viewer than Glogg or Cat?

Member

Super Moderator

Member

Active Member

Well-Known Member

Well-Known Member

Active Member

Member

Active Member

Member

Super Moderator

Member

Super Moderator

Well-Known Member

Super Moderator

Super Moderator

Gold Member

Super Moderator

Gold Member

Super Moderator