Processing math: 100%

Unix & Linux Commands Cookbook

2025-02-19   workflow unix programming


HUM307 Command-line Tips and Tricks from Brian Kernighan (archived)

Problem solving with Unix commands Vegard Stikbakke (archived)


Download a whole website

wget -w 2 -r -np -k -p www.example.com

Extract text of a webpage

curl www.google.com | lynx -dump -stdin

-dump (extract the text), -stdin (read from the pipe)



Search directory tree for a word in file name (tree depth by 2, case insensitive)

tree -L 2 | grep 'lisp' -i

-L: (limit the depth), -i: (ignore case)



Find files modified within last 3 days

find . -mtime -3

-mtime -3 (modified within last 3 days)

  • You can use +3 instead for finding files modified before last 3 days

Add same line to all files recursively in a dir

find . -type f -name "*.md" -exec sh -c 'echo "This line will be added to all" >> "$0"' {} \;

Move all files in subdirectories to the current path

find . -mindepth 2 -type f -exec mv {} . \;
  • -mindepth 2 ensures we only get files in subdirs (not those already in current dir)


Remove all empty directories

find . -mindepth 1 -type d -empty -delete
  • -mindepth 1 ensures we don't try deleting the current directory


Find large files over 100MB and sort them by size

find . -type f -size +100M -exec ls -lh {} \; | sort -rh -k5

-size +100M (files larger than 100MB), sort -rh (reverse human-readable sort)



Find and replace text in multiple files

find . -type f -name "*.txt" -exec sed -i 's/oldtext/newtext/g' {} +

Find duplicate files based on content (not name)

find . -type f -exec md5sum {} \; | sort | uniq -w32 -dD

Create a simple HTTP server in current directory

python3 -m http.server 8080

Generate a tree view of directory excluding certain patterns

tree -I 'node_modules|cache|tmp|vendor|.git' --dirsfirst -aC

Remove empty lines from a file

sed -i '' '/^[:space:](/dead)*$/d' file-path 

Find all URLs in a directory, and clean them from http(s) prefixes and trailing slashes, and list all

find . -type f -name "*.[txt|md]" -exec perl -lne 'print \(1 while /(https?:\/\/[^\s)\]]+)/g' {} \; | sed -e 's|^https://||;s|^http://||' -e 's/\/\)//'


Compare URL's in two files: find all URLs in a directory, and clean them from http(s) prefixes and trailing slashes, and list all

capture_urls() { perl -lne 'print \(1 while /(https?:\/\/[^\s)\]]+)/g' "\)1" | sed -e 's|^https://||;s|^http://||' -e 's/\/$//' }
diff <(capture_urls file1.md) <(capture_urls file2.md)
  1. diff <(...) <(...): Compare output of two commands

  2. For each file:

    • Extract URLs using perl (perl -lne 'print $1 while /(https?:\/\/[^\s)\]]+)/g')

    • Clean URLs by removing http(s) prefix and trailing slashes using sed

  3. Shows differences between the two files:

    • Lines starting with < appear only in file1

    • Lines starting with > appear only in file2

    • No output means the files contain the same URLs



Copy-paste in pipeline

pbpaste | <your-command>

Prettify youtube transcripts

  1. Copy transcript to clipboard (have a format with bunch of timestamps and new lines)

  2. Run the following

pbpaste | sed 's/[0-9]:[0-9][0-9]//g' | tr -d '\n'


References





Incoming Internal References (0)



Receive my updates

Barış Özmen © 2025