|
Hi Team,
I have noticed that a lot af sql queries use the cartesian product to join tables. For example, consider the 2 following tables: customer(id_customer[PK], name, birthdate...) and addresses(id_address[PK], id_customer[FK], street, postal_code, ..) and you want to obtain all the customer data living at a given postal code. When using the cartesian product, you will do:
SELECT c.id_customer, c.name, c.birthdate FROM customer c, addresses a WHERE a.id_customer = c.id_customer AND a.postal_code = "66280"
the system will combine the number of rows in customer table with the number of rows of addresses table and apply the filter after.
When using JOIN clauses, you will do:
SELECT c.id_customer, c.name, c.birthdate FROM customer c INNER JOIN addresses a ON a.id_customer = c.id_customer WHERE a.postal_code = "66280"
and the filter will be applied on the joined tables which correspond to the good rows.
So, I don't know if it was clear, but on big row number tables, you will certainly have performance problems by using cartesian product. At the same time, cartesian product is less precise than join (you can use left join or right join); for example if you have no adress for a customer, you will lose information about this customer by using cartesian product. Instead, by using the left join clause, you won't have this problem.
So, it's important, I think, to fix this for future as huge tables will penalise the global performance of the website. It will be useful too for the results of the queries.
Best Regards
Helgvor STOLL
|