Detectar y eliminar duplicados en SQL

delDuplicates.py
import mariadb
 
# Connect to the database
cnx = mariadb.connect(
    host="localhost",
    port=3306,
    user="user",
    password="my****SUPERSECRET****Password",
    database="VeeamReports"
)
 
# Get all tables in the database
cursor = cnx.cursor()
cursor.execute("SHOW TABLES")
tables = cursor.fetchall()
 
# Iterate over each table
for table in tables:
    table_name = table[0]
 
    # Create a query to delete duplicates based on 'creationtime' and 'vmname'
    # The INNER JOIN compares rows in the same table (t1 and t2) based on 'creationtime' and 'vmname'
    # It then deletes rows where t1.id is less than t2.id, keeping only the row with the lowest id
    query = f"""
    DELETE t1 FROM {table_name} t1
    INNER JOIN {table_name} t2
    WHERE t1.creationtime = t2.creationtime AND t1.vmname = t2.vmname AND t1.id < t2.id
    """
    cursor.execute(query)
 
# Commit the changes and close the connection
cnx.commit()
cnx.close()

Estos comentarios explican cada sección del script:

  1. Connect to the database: Establece la conexión con la base de datos.
  2. Get all tables in the database: Obtiene una lista de todas las tablas en la base de datos.
  3. Iterate over each table: Recorre cada tabla.
  4. Create a query to delete duplicates based on 'creationtime' and 'vmname': Crea una consulta para eliminar duplicados basados en las columnas 'creationtime' y 'vmname'.
    1. INNER JOIN: Compara filas en la misma tabla (`t1` y `t2`) basándose en 'creationtime' y 'vmname'.
    2. WHERE clause: Elimina filas donde `t1.id` es menor que `t2.id`, conservando solo la fila con el id más bajo.
  5. Commit the changes and close the connection: Confirma los cambios y cierra la conexión con la base de datos.