How to Delete Notepad++ Lines not Containing a Word
Computer is supposed to help make things easier for us. One simple example is to delete lines from a text file that doesn’t contain a specific keyword. This task is a no brainer but very time consuming and tedious. Recently I have spent some time in compiling a list of websites that has copied and published articles taken from this blog to their website. Although Google does a pretty good job in determining the original publisher, it is still a robot based on a bunch of constantly changing algorithm that can and has made mistakes. Searching for websites that has copied the posts from here is very time consuming, so I have used Copyscape Premium to automatically perform a batch scan on all 2000 articles on this website to track down plagiarism of the content from this blog.
Copyscape Premium finished scanning all 2000 posts in just 10 hours and I was able to export the results to a CSV file for further investigation. There are over 20,000 URLs in the list and I want to categorize the websites based on the domain names. Not all websites from the list are copycats but most of the websites hosted in free hosts such as blogspot/blogger/wordpress are either scrapers or copy paster. Once the URLs are categorized, I can concentrate on filing a DMCA complaint to Blogger, then followed by WordPress instead of jumping back and forth.
Linux users can easily delete lines that doesn’t contain specific words by using the global ex command but unfortunately we need a software to do that in Windows. Since I’m a Notepad++ user, I discovered that it is possible to automatically delete lines using Notepad++ when the word specified by you is not present. Here is an example on how to remove lines that don’t contain the word “blogspot.com” or in another words I only want to keep the lines that contains the word “blogspot”.1. Run Notepad++, either open the text file that you want to edit or paste the text into the empty page.
2. Go to Search menubar and select Find
3. Go to Mark tab, check Bookmark line checkbox, enter blogspot.com at the find what box, and click the Mark all button. A blue icon will be added to the line that contains the word blogspot.com
4. Close the Mark window.
5. Go to Search menubar > Bookmark > and select Remove Unmarked Lines
If the text file that you’re editing is very large, it may take a while for the process to complete. Alternatively, you can also select Remove Bookmarked Lines from Search > Bookmark if you’re trying to delete lines that contain the words that you specify. Please see the embedded video below if you’re having trouble following the step-by-step instructions on how to delete lines without the keywords using Notepad++.
Way easier than using regex. Thanks!
You are amazing!
You are savior of my day literary! Thank you
Thanks so much for this great tip! And thanks for all your website! P., Brussels, Belgium.
Thx!
Thanks a lot .. it really helped
Excellent post. Thank you!
Very useful. Thanks for sharing!
I really appreciate,Its a good Time saver
This is a Great Help
Thank you very much …
for 3 words you can write (word1|word2|word3).
Very useful Tutorial — Thank you!
For my text processing project, I need to complete almost symmetrical procedure: find all specific words in a file, and to push each found word to the end (or beginning) of the corresponding line. Since there are already more than 90K lines and counting, it’s impossible to do it by hand. I was thinking first marking the lines containing specific words and then eliminating the other lines. Then re-iterating for other keywords, and finally merging the new files but this is too awkward approach. Could you suggest some NP++ hint which might help?
Thank you by advance for your time.
Thank you sir! This is exactly what I needed. Now I have a few extra hours to spare.
Thank you!! *HUGE* time saver!!
Thanks for sharing! Real time saver ;)
Thanks Ray.
This is a wonderful tool, will save me a lot of manual work.
You are quite right to stand up for your rights, Ray. Thumbs up from me.
Hi Raymon,
Thanks for this tip, didn’t know about it. Last time i search about how to do this on Notepad++ i found one article showing how to do it using the TextFx Viz and the option to hide text lines matching the text from the clipboard it works but it seems harder to use comparing to this bookmark technique.
Why reinvent the wheel ?
type input.txt|find “blogspot.com”>result.txt
(only lines with “blogspot.com”)
type input.txt|find /V “blogspot.com”>result.txt
(only lines without “blogspot.com”)
FIND is case sensitive. Use ‘/I’ for case insensitive search.
With *real* DOS there was a 64k limit. Don’t know with more recent version ;-) …
Thanks Bernard for sharing this knowledge with all of us!