MATOMO - Updating From Nginx Logs without duplicates

From wiki.1001solutions.net
Revision as of 12:06, 26 March 2024 by Z (talk | contribs)


Log Deduplication Problem

Log Import : Avoid Importing Duplicates

To avoid duplicates, there is no solution on community edition of Matomo.
Let's import data from logs files, then play around the --exclude-older-than option.


Bash Script Example

This dirty script store timestamp in a file to exclude older logs than last check while importing with the provided script /var/www/html/matomo/misc/log-analytics/import_logs.py

#!/bin/bash
#
#	--exclude-older-than EXCLUDE_OLDER_THAN
#		Ignore logs older than the specified date. Exclusive.
#		Date format must be YYYY-MM-DD hh:mm:ss +/-0000.
#		The timezone offset is required.
#
#	For print date on linux: date +"%Y-%m-%d %H:%M:%S %z"


# VARIABLES
SLEEP_TIME=1
TIMESTAMP_FILE="/root/last_run_timestamp_for_matomo.nfo"
LOG_PATH="/var/log/matomo-archive.log"


# GET TIMESTAMP OF LAST CHECK FROM FILE
TIMESTAMP=$(cat "$TIMESTAMP_FILE")
echo $TIMESTAMP

# GET CURRENT TIMESTAMP
NEW_TIMESTAMP=$(date +"%Y-%m-%d %H:%M:%S %z")

# CUSTOM PYTHON IMPORT CUSTOM_COMMAND
CUSTOM_COMMAND="python3 /var/www/html/matomo/misc/log-analytics/import_logs.py --accept-invalid-ssl-certificate --url=http://matomo.lanv --recorders=6 --enable-http-errors --enable-http-redirects --enable-static --enable-bots --debug-tracker"


# RSYNC REVERSE PROXY LOGS IN /TMP
logger "##### MATOMO SCRIPT : Beginning script"
logger "##### MATOMO SCRIPT : Beginning rsync reverse proxy logs..."
rsync -arvz -e "ssh -p 22" matomo@10.24.0.1:/var/log/nginx/*example.org.access.log* /tmp/ >> $LOG_PATH
rsync -arvz -e "ssh -p 22" matomo@10.24.0.1:/var/log/nginx/*example.org.access.log*.1 /tmp/ >> $LOG_PATH
rsync -arvz -e "ssh -p 22" matomo@10.24.0.1:/var/log/nginx/*example2.com.access.log* /tmp/ >> $LOG_PATH
rsync -arvz -e "ssh -p 22" matomo@10.24.0.1:/var/log/nginx/*example2.com.access.log*.1 /tmp/ >> $LOG_PATH


# IMPORTING LOGS
logger "##### MATOMO SCRIPT : Beginning import Matomo.IT-Arts.net"
$CUSTOM_COMMAND --exclude-older-than="$TIMESTAMP" --idsite=1 /tmp/matomo.example.org.access.log* >> $LOG_PATH
sleep $SLEEP_TIME

# AND SO ON...
...
...
...


logger "##### MATOMO SCRIPT : Beginning archiving"
cd /var/www/html/matomo && php console core:archive --force-all-websites --url='http://matomo.lanv' >> $LOG_PATH



# UPDATE TIMESTAMP
logger "##### MATOMO SCRIPT : Updating timestamp in "$TIMESTAMP_FILE
echo $NEW_TIMESTAMP > $TIMESTAMP_FILE
logger "##### MATOMO SCRIPT : New timestamp : "$TIMESTAMP



logger "##### MATOMO SCRIPT : End of script"

exit 0


Links