Posting this in Kali since this is manly going to be used as a Kali type tool.
Code purpose? 1- A teaching tool to provide useful info to schools, nursing homes, Churches, libraries and more where I volunteer to show one method used by people to generate info to crack passwords in a visual manner that they can see and learn to better enable their understanding of how such people think. 2- To provide an easier method teachers can generate word-lists in their classroom based on their own individual students cognitive and learning abilities rather than using completely generic word-lists handed down as a one size fits all type of tool based on where students should be as opposed to where they really are based on a teachers own judgement. 3- My own growth in my field of employment as well.
Here is what I have written:
I have written, not sure if you would call it a program, app or a script. Any ways, using what I have learned here on this forum and doing some searching to come up with a manner using sed mostly to search through files on my computer and have this app generate a wordlist that meets my specific needs.
I do not know how much detail I need to go into here because I am sure you can look at the included attachment and see what I am trying to do. I will give a quick overview for anyone who visits and, like we all were (and I still am) don’t have a clue.
This I wanted it to be able to search a document(s) as (input) and create another (output) document that had every word one per line by using all spaces as a marker and then using those markers to insure each word was on a separate line, in alfa order ignoring all punctuation and numbers. This will make everything lower-case and delete any blank lines also. The output file will be ascii encoding so that special charsets would be ignored as well as non-English chars. It should delete any line with the @ symbol and http so as to prevent email addresses and websites from being included in the output file. Finally it would alpha organize the words and remove any duplicate words.
I am having issues with non- punctuation chars showing up such as <, #, %, / and more. I could deal with these in the same manner that I dealt with @ and http however I was thinking there could be a better method of doing this.
Other issues I am having that I have not been able to deal with is:
In the end the output should provide a nice sorted one word per line file.
I would like for you’s to pick it apart and let me know if I could have done any part of this in a different way, or whatever. Like in my introduction post I am here to learn so completely destroying what I have done is fine with me. I need the knowledge so if you see a better way I would like to know.
Other things I plan on adding is the capability to compare the output file with another list and not print out words I already have on that list.
I will throw in a question. Can this be made into a program so I can just click it and have it do everything by a click rather than having to put it in the command-line. I considered using ./ but decided to wait until I got feedback before doing it. I fully intend for everything to be in a single folder.
I understand this is a lot to ask so if it takes time to get an answer I understand. I so much wanted to find ways to deal with these issues on my own but after 5 days I thought maybe I needed to ask for help. I think I could deal with some of these issues by constantly repeating some sed commands but didn’t know if it would be good form to do so.
Thanks for any and all help anyone can supply. Over all I think I did pretty good but came up short. Instead of breaking this down into several posts my thinking is if anyone else comes through that having it all in one post would be more beneficial to them as well.
Thanks!
Code purpose? 1- A teaching tool to provide useful info to schools, nursing homes, Churches, libraries and more where I volunteer to show one method used by people to generate info to crack passwords in a visual manner that they can see and learn to better enable their understanding of how such people think. 2- To provide an easier method teachers can generate word-lists in their classroom based on their own individual students cognitive and learning abilities rather than using completely generic word-lists handed down as a one size fits all type of tool based on where students should be as opposed to where they really are based on a teachers own judgement. 3- My own growth in my field of employment as well.
Here is what I have written:
Code:
cat *.txt |
perl -nle 'print if m{^[[:ascii:]]+$}' |
sed '/^@/d' |
sed '/^http/d' |
sed 's/[0-9]*//g' |
sed 's/ /\n/g' |
sed 's/\r/\n/g' |
sed 's/^[[:punct:]]*//' |
sed 's/[[:punct:]]*$//' |
tr '[:upper:]' '[:lower:]' | sort | uniq > 1-1.txt
I have written, not sure if you would call it a program, app or a script. Any ways, using what I have learned here on this forum and doing some searching to come up with a manner using sed mostly to search through files on my computer and have this app generate a wordlist that meets my specific needs.
I do not know how much detail I need to go into here because I am sure you can look at the included attachment and see what I am trying to do. I will give a quick overview for anyone who visits and, like we all were (and I still am) don’t have a clue.
This I wanted it to be able to search a document(s) as (input) and create another (output) document that had every word one per line by using all spaces as a marker and then using those markers to insure each word was on a separate line, in alfa order ignoring all punctuation and numbers. This will make everything lower-case and delete any blank lines also. The output file will be ascii encoding so that special charsets would be ignored as well as non-English chars. It should delete any line with the @ symbol and http so as to prevent email addresses and websites from being included in the output file. Finally it would alpha organize the words and remove any duplicate words.
I am having issues with non- punctuation chars showing up such as <, #, %, / and more. I could deal with these in the same manner that I dealt with @ and http however I was thinking there could be a better method of doing this.
Other issues I am having that I have not been able to deal with is:
- Although it does deal with the blank lines it always has line number one as a blank line.
- Most of the words are formatted as I want I am getting some words that are 3 spaces back while some others are as much as 15 spaces back.
- I get a black background box with 2 to 4 letters and I have no idea why.
- I like the idea of having “-“ being used like this “aluminum-extruded” but I also will have a hyphen showing up in ways I don’t want it.
- I am getting things like “amanue!oglvee!norm” that I do not understand why. Being a typo I get but I did a search in this folder and that did not come in as a result.
- I understand why things like this happen “amateur/commercial/educational” but not sure if or how to I can break them up where the “/” would be used as a marker for a new line like I use the space’s as a marker for a new line.
- I also get things like this “ambitinternational” but again searching the documents I do not get this as a result.
- I get things like “asharp-inchscreen,powerfulintelatomprocessorandanmpshooterwitharangeofmodes” and it blows my mind on how to stop this
In the end the output should provide a nice sorted one word per line file.
I would like for you’s to pick it apart and let me know if I could have done any part of this in a different way, or whatever. Like in my introduction post I am here to learn so completely destroying what I have done is fine with me. I need the knowledge so if you see a better way I would like to know.
Other things I plan on adding is the capability to compare the output file with another list and not print out words I already have on that list.
I will throw in a question. Can this be made into a program so I can just click it and have it do everything by a click rather than having to put it in the command-line. I considered using ./ but decided to wait until I got feedback before doing it. I fully intend for everything to be in a single folder.
I understand this is a lot to ask so if it takes time to get an answer I understand. I so much wanted to find ways to deal with these issues on my own but after 5 days I thought maybe I needed to ask for help. I think I could deal with some of these issues by constantly repeating some sed commands but didn’t know if it would be good form to do so.
Thanks for any and all help anyone can supply. Over all I think I did pretty good but came up short. Instead of breaking this down into several posts my thinking is if anyone else comes through that having it all in one post would be more beneficial to them as well.
Thanks!