DeHackEd t1_jegltx8 wrote on March 31, 2023 at 9:32 PM

"Keys" are selections of column in a table that are indexed in some way, intended to allow for much faster searching. Typically the index is stored in a different file from the raw table records. With that said...

A "primary key" is the key designed to uniquely identify rows in a table. So if a primary key is defined on columns (a,b,c) then a search with condition "WHERE a=1 AND b=2 AND c=3" will turn up exactly 0 or 1 rows. 2+ rows are impossible. As a result, rules that may allow multiple row matches, especially where NULL is involved, are forbidden. For example, if you run a shopping store, then products could be uniquely identified by their barcode numbers. Duplicate of that should never happen, and if they do then yeah you have a problem that needs solving anyway.

A "foreign key" is a cross-reference between tables. Table X may have a column named A and it's a foreign key reference to table Y which also has a column named A (typically the same name to help reinforce they are the same piece of information for that cross-reference). The database may refuse to allow you to delete information in one table because the other references it, or the act of deleting from one table may cause deletion of data in the other table to keep the foreign key requirements met, depending on settings. It also helps software visualize the relationship between the tables. For example, if you have a table of customers, and a table of outstanding orders, you may not delete a customer if they have any outstanding orders placed, but deleting orders is fine.

"Clustered index" means the actual raw records on disk are sorted to this ordering. There can be advantages to that if you are fetching large amounts of data and you want it delivered pre-sorted... if you ask for it in the same sort order that the clustered index is provided in, then the DB just does sequential disk reads and gets data back pre-sorted, solving that problem without needing to actually perform a sort in memory or anything.

Now let's go back to the primary key: a unique way to identify rows. How do you actually do that? There's 2 common strategies: a unique number typically starting at 1 and counting up indefinitely, or finding some set of columns that are actually enough to uniquely identify a row. The former is what we call a "surrogate" key - something we introduced to assist. The latter is what we call a natural key - something already provided. For our list of products, barcodes make a pretty good natural key. For our customers, names aren't unique, even addresses aren't uniqu and we don't have anything we can give them (damned customers keep refusing our loyalty cards) so we just give them a customer number and each new customers gets the next number. That's a surrogate key. Customers might see the number on their invoices, but by itself the number doesn't mean thing.

DennisTheBald t1_jegpw8x wrote on March 31, 2023 at 10:01 PM

Wow, you've really unlocked keys

SundanceInTheTrees t1_jegzpap wrote on March 31, 2023 at 11:14 PM

Wish I could have read this while I was back in college. Well written!

Rcomian t1_jegmbsh wrote on March 31, 2023 at 9:36 PM

primary key: the thing that is guaranteed to identify a row in a table. it might be a single column that's an incremental number (an identity column), or it might be something like a guid, or it might be two columns that together make a unique value, but might not be unique in themselves.

foreign key: when one table links a column or columns to the primary key in a different table. this is good for referential integrity, you can't put a value into the foreign key columns unless they actually refer to a row in the other table. and that row is prevented from being deleted if something references it. (or you do a cascading delete, that deletes all records pointing to the initial record you're deleting).

clustered index: the columns in the table that define the order that data is stored in. normally it's the primary key. sometimes it can be useful to have that as a timestamp, so records from similar times are all kept together in storage.

natural key: sometimes the data you're given has a key that uniquely identifies the record already available in it. a (bad) example might be your social security number. that uniquely identifies you from everyone else in the usa. isn't good for foreigners tho. the chassis number on a car might be another natural key.

surrogate key: if there's no natural primary key in your data, you'll need to give it one. if you just number the first inserted record as 1, the next as 2, etc, that key is not part of the natural data you're storing, it's surrogating for a natural key that isn't present.

[deleted] t1_jegi93w wrote on March 31, 2023 at 9:07 PM

[removed]

ELI5: Can someone please explain to me the difference between a primary key, foreign key, clustered index, natural key, and surrogate key?

Comments

DeHackEd t1_jegltx8 wrote on March 31, 2023 at 9:32 PM

DennisTheBald t1_jegpw8x wrote on March 31, 2023 at 10:01 PM

SundanceInTheTrees t1_jegzpap wrote on March 31, 2023 at 11:14 PM

Rcomian t1_jegmbsh wrote on March 31, 2023 at 9:36 PM

[deleted] t1_jegi93w wrote on March 31, 2023 at 9:07 PM