Line Parsing Reminder (Duplicate removal)

Andrew Bolster

Senior R&D Manager (Data Science) at Synopsys Software Integrity Group and Treasurer @ Bsides Belfast and NI OpenGovernment Network

So, say you have a long list of instruction (like multiple apt-get install lines) and you want to eliminate common words?

Easiest way to do it is (assuming you have all of the instrustions in “list.txt”)

[FYI the ‘' character indicates a continuation of a single line ]

cat list.txt\

tr ‘ ‘ ‘\n’ \            #Expands all space characters to new lines
sort uniq \    #sorts each line, and then eliminates duplicates
tr ‘\n’ ‘‘               #turns all the new-lines into spaces

Depending on the actual content, it may be necessary to remove specific entries, (such as apt-get or sudo). Thats an exercise for the reader.

blog comments powered by Disqus