MATOMO - Updating From Nginx Logs without duplicates
From wiki.1001solutions.net
Revision as of 19:25, 4 January 2024 by Z (talk | contribs) (Created page with "Category:Post-It == UPDATING FROM NGINX LOGS WITHOUT DUPLICATES == To avoid duplicates, there is no solution on community edition of Matomo.<br /> Let's import data from...")
UPDATING FROM NGINX LOGS WITHOUT DUPLICATES
To avoid duplicates, there is no solution on community edition of Matomo.
Let's import data from logs files, then play around the --exclude-older-than option.
This dirty script store timestamp in a file to exclude older logs than last check while importing with the provided script /var/www/html/matomo/misc/log-analytics/import_logs.py
#!/bin/bash # # --exclude-older-than EXCLUDE_OLDER_THAN # Ignore logs older than the specified date. Exclusive. # Date format must be YYYY-MM-DD hh:mm:ss +/-0000. # The timezone offset is required. # # For print date on linux: date +"%Y-%m-%d %H:%M:%S %z" # VARIABLES SLEEP_TIME=1 TIMESTAMP_FILE="/root/last_run_timestamp_for_matomo.nfo" LOG_PATH="/var/log/matomo-archive.log" # GET TIMESTAMP OF LAST CHECK FROM FILE TIMESTAMP=$(cat "$TIMESTAMP_FILE") echo $TIMESTAMP # GET CURRENT TIMESTAMP NEW_TIMESTAMP=$(date +"%Y-%m-%d %H:%M:%S %z") # CUSTOM PYTHON IMPORT CUSTOM_COMMAND CUSTOM_COMMAND="python3 /var/www/html/matomo/misc/log-analytics/import_logs.py --accept-invalid-ssl-certificate --url=http://matomo.lanv --recorders=6 --enable-http-errors --enable-http-redirects --enable-static --enable-bots --debug-tracker" # RSYNC REVERSE PROXY LOGS IN /TMP logger "##### MATOMO SCRIPT : Beginning script" logger "##### MATOMO SCRIPT : Beginning rsync reverse proxy logs..." rsync -arvz -e "ssh -p 60022" 10.24.0.1:/var/log/nginx/*example.org.access.log* /tmp/ >> $LOG_PATH rsync -arvz -e "ssh -p 60022" 10.24.0.1:/var/log/nginx/*example.org.access.log*.1 /tmp/ >> $LOG_PATH rsync -arvz -e "ssh -p 60022" 10.24.0.1:/var/log/nginx/*example2.com.access.log* /tmp/ >> $LOG_PATH rsync -arvz -e "ssh -p 60022" 10.24.0.1:/var/log/nginx/*example2.com.access.log*.1 /tmp/ >> $LOG_PATH # IMPORTING LOGS logger "##### MATOMO SCRIPT : Beginning import Matomo.IT-Arts.net" $CUSTOM_COMMAND --exclude-older-than="$TIMESTAMP" --idsite=1 /tmp/matomo.example.org.access.log* >> $LOG_PATH sleep $SLEEP_TIME # AND SO ON... ... ... ... logger "##### MATOMO SCRIPT : Beginning archiving" cd /var/www/html/matomo && php console core:archive --force-all-websites --url='http://matomo.lanv' >> $LOG_PATH # UPDATE TIMESTAMP logger "##### MATOMO SCRIPT : Updating timestamp in "$TIMESTAMP_FILE echo $NEW_TIMESTAMP > $TIMESTAMP_FILE logger "##### MATOMO SCRIPT : New timestamp : "$TIMESTAMP logger "##### MATOMO SCRIPT : End of script" exit 0
LINKS
- https://matomo.org/faq/on-premise/installing-matomo/
- https://github.com/matomo-org/matomo-log-analytics/#readme
- https://matomo.org/faq/general/how-do-i-run-the-log-file-importer-script-with-default-options/
- https://github.com/matomo-org/matomo-log-analytics/issues/344
- https://github.com/matomo-org/matomo-nginx
- https://www.linuxcapable.com/how-to-install-matomo-with-lemp-on-ubuntu-linux/
- https://matomo.org/faq/how-to-install/faq_98/
- https://www.restack.io/docs/matomo-knowledge-matomo-error-logs-guide
- https://github.com/matomo-org/matomo-log-analytics/issues/264