I was reading the
LJ 150 and at a certain moment, during reading the article from Dave Taylor named
"Analyzing Log Files" I noticed some processor consuming order in his examples.
At a certain moment he wrote the next commandprompt to search for HTML files in the access_log:
awk '{ print }' access_log | sort | uniq -c | sort -rn | grep "\.html" | headThis command consumes at my system:
real 0m0.097s
user 0m0.084s
sys 0m0.020sIf you put the grep command right after the awk command
awk '{ print }' access_log | grep "\.html" | sort | uniq -c | sort -rn | headthe filter consumes:
real 0m0.042s
user 0m0.028s
sys 0m0.012sThe reason why this is faster can be explained that in the first filter you will sort first the
whole dataset and after that you remove with grep the non .html entries. The second one (the one I suggest) removes first all the non .html entries and will sort it afterwards.