• Home   /  
  • Archive by category "1"

Many-To-Many Relationship Definition Essay

Today, we continue our journey into the world of SQL and relational database systems. In this part three of the series, we'll learn how to work with multiple tables that have relationships with each other. First, we will go over some core concepts, and then will begin working with JOIN queries in SQL.

You can also see SQL databases in action by checking out the SQL scripts, apps and add-ons on Envato Market.

Catch Up

Introduction

When creating a database, common sense dictates that we use separate tables for different types of entities. Some examples are: customers, orders, items, messages etc... But we also need to have relationships between these tables. For instance, customers make orders, and orders contain items. These relationships need to be represented in the database. Also, when fetching data with SQL, we need to use certain types of JOIN queries to get what we need.

There are several types of database relationships. Today we are going to cover the following:

  • One to One Relationships
  • One to Many and Many to One Relationships
  • Many to Many Relationships
  • Self Referencing Relationships

When selecting data from multiple tables with relationships, we will be using the JOIN query. There are several types of JOIN's, and we are going to learn about the the following:

  • Cross Joins
  • Natural Joins
  • Inner Joins
  • Left (Outer) Joins
  • Right (Outer) Joins

We will also learn about the ON clause and the USING clause.

One to One Relationships

Let's say you have a table for customers:

We can put the customer address information on a separate table:

Now we have a relationship between the Customers table and the Addresses table. If each address can belong to only one customer, this relationship is "One to One". Keep in mind that this kind of relationship is not very common. Our initial table that included the address along with the customer could have worked fine in most cases.

Notice that now there is a field named "address_id" in the Customers table, that refers to the matching record in the Address table. This is called a "Foreign Key" and it is used for all kinds of database relationships. We will cover this subject later in the article.

We can visualize the relationship between the customer and address records like this:

Note that the existence of a relationship can be optional, like having a customer record that has no related address record.

One to Many and Many to One Relationships

This is the most commonly used type of relationship. Consider an e-commerce website, with the following:

  • Customers can make many orders.
  • Orders can contain many items.
  • Items can have descriptions in many languages.

In these cases we would need to create "One to Many" relationships. Here is an example:

Each customer may have zero, one or multiple orders. But an order can belong to only one customer.

Many to Many Relationships

In some cases, you may need multiple instances on both sides of the relationship. For example, each order can contain multiple items. And each item can also be in multiple orders.

For these relationships, we need to create an extra table:

The Items_Orders table has only one purpose, and that is to create a "Many to Many" relationship between the items and the orders.

Here is a how we can visualize this kind of relationship:

If you want to include the items_orders records in the graph, it may look like this:

Self Referencing Relationships

This is used when a table needs to have a relationship with itself. For example, let's say you have a referral program. Customers can refer other customers to your shopping website. The table may look like this:

Customers 102 and 103 were referred by the customer 101.

This actually can also be similar to "one to many" relationship since one customer can refer multiple customers. Also it can be visualized like a tree structure:

One customer might refer zero, one or multiple customers. Each customer can be referred by only one customer, or none at all.

If you would like to create a self referencing "many to many" relationship, you would need an extra table like just like we talked about in the last section.

Foreign Keys

So far we have only learned about some of the concepts. Now it is time to bring them to life using SQL. For this part, we need to understand what Foreign Keys are.

In the relationship examples above, we always had these "****_id" fields that referenced a column in another table. In this example, the customer_id column in the Orders table is a Foreign Key column:

With a database like MySQL, there are two ways to create foreign keys columns:

Defining the Foreign Key Explicitly

Let's create a simple customers table:

CREATE TABLE customers ( customer_id INT AUTO_INCREMENT PRIMARY KEY, customer_name VARCHAR(100) );

Now the orders table, which will contain a Foreign Key:

CREATE TABLE orders ( order_id INT AUTO_INCREMENT PRIMARY KEY, customer_id INT, amount DOUBLE, FOREIGN KEY (customer_id) REFERENCES customers(customer_id) );

Both columns (customers.customer_id and orders.customer_id) should be the same exact data structure. If one is INT, the other one should not be BIGINT for example.

Please note that in MySQL only the InnoDB engine has full support for Foreign Keys. But other storage engines will still allow you to specify them without giving any errors. Also the Foreign Key column is indexed automatically, unless you specify another index for it.

Without Explicit Declaration

Same orders table can be created without explicitly declaring the customer_id column to be a Foreign Key:

CREATE TABLE orders ( order_id INT AUTO_INCREMENT PRIMARY KEY, customer_id INT, amount DOUBLE, INDEX (customer_id) );

When retrieving data with a JOIN query, you can still treat this column as a Foreign Key even though the database engine is not aware of that relationship.

SELECT * FROM orders JOIN customers USING(customer_id)

We are going to learn about JOIN queries further in the article.

Visualizing the Relationships

My current favorite software for designing databases and visualizing the Foreign Key relationships is MySQL Workbench.

Once you design your database, you can export the SQL and run it on your server. This comes in very handy for bigger and more complex database designs.

JOIN Queries

To retrieve data from a database that has relationships, we often need to use JOIN queries.

Before we get started, let's create the tables and some sample data to work with.

CREATE TABLE customers ( customer_id INT AUTO_INCREMENT PRIMARY KEY, customer_name VARCHAR(100) ); CREATE TABLE orders ( order_id INT AUTO_INCREMENT PRIMARY KEY, customer_id INT, amount DOUBLE, FOREIGN KEY (customer_id) REFERENCES customers(customer_id) ); INSERT INTO `customers` (`customer_id`, `customer_name`) VALUES (1, 'Adam'), (2, 'Andy'), (3, 'Joe'), (4, 'Sandy'); INSERT INTO `orders` (`order_id`, `customer_id`, `amount`) VALUES (1, 1, 19.99), (2, 1, 35.15), (3, 3, 17.56), (4, 4, 12.34);

We have 4 customers. One customer has two orders, two customers have one order each, and one customer has no order. Now let's see the different kinds of JOIN queries we can run on these tables.

Cross Join

This is the default type of JOIN query when no condition is specified.

The result is a so called "Cartesian product" of the tables. It means that each row from the first table is matched with each row of the second table. Since each table had 4 rows, we ended up getting a result of 16 rows.

The JOIN keyword can be optionally replaced with a comma instead.

Of course this kind of result is usually not useful. So let's look the other join types.

Natural Join

With this kind of JOIN query, the tables need to have a matching column name. In our case, both the tables have the customer_id column. So, MySQL will join the records only when the value of this column is matching on two records.

As you can see the customer_id column is only displayed once this time, because the database engine treats this as the common column. We can see the two orders placed by Adam, and the other two orders by Joe and Sandy. Finally we are getting some useful information.

Inner Join

When a join condition is specified, an Inner Join is performed. In this case, it would be a good idea to have the customer_id field match on both tables. The results should be similar to the Natural Join.

The results are the same except a small difference. The customer_id column is repeated twice, once for each table. The reason is, we merely asked the database to match the values on these two columns. But it is actually unaware that they represent the same information.

Let's add some more conditions to the query.

This time we received only the orders over $15.

ON Clause

Before moving on to other join types, we need to look at the ON clause. This is useful for putting the JOIN conditions in a separate clause.

Now we can distinguish the JOIN condition from the WHERE clause conditions. But there is also a slight difference in functionality. We will see that in the LEFT JOIN examples.

USING Clause

USING clause is similar to the ON clause, but it's shorter. If a column is the same name on both tables, we can specify it here.

In fact, this is much like the NATURAL JOIN, so the join column (customer_id) is not repeated twice in the results.

Left (Outer) Join

A LEFT JOIN is a type of Outer Join. In these queries, if there is no match found from the second table, the record from the first table is still displayed.

Even though Andy has no orders, his record is still being displayed. The values under the columns of the second table are set to NULL.

This is also useful for finding records that do not have relationships. For example, we can search for customers who have not placed any orders.

All we did was to look for NULL values for the order_id.

Also note that the OUTER keyword is optional. You can just use LEFT JOIN instead of LEFT OUTER JOIN.

Conditionals

Now let's look at a query with a condition.

So what happened to Andy and Sandy? LEFT JOIN was supposed to return customers with no matching orders. The problem is that the WHERE clause is blocking those results. To get them we can try to include the NULL condition as well.

We got Andy but no Sandy. Still this does not look right. To get what we want, we need to use the ON clause.

Now we got everyone, and all orders above $15. As I said earlier, the ON clause sometimes has slightly different functionality than the WHERE clause. In an Outer Join like this one, rows are included even if they do not match the ON clause conditions.

Right (Outer) Join

A RIGHT OUTER JOIN works exactly the same, but the order of the tables are reversed.

This time we have no NULL results because every order has a matching customer record. We can change the order of the tables and get the same results as we did from the LEFT OUTER JOIN.

Now we have those NULL values because the customers table is on the right side of the join.

Conclusion

Thank you for reading the article. I hope you enjoyed it! Please leave your comments and questions, and have a great day!

Don't forget to check out the SQL scripts, apps and add-ons on Envato Market. You'll get a sense of what's possible with SQL databases, and you might find the perfect solution to help you with your current developing project.

Follow us on Twitter, or subscribe to the Nettuts+ RSS Feed for the best web development tutorials on the web.

Subsections


An introduction to databases

Introduction

The Celtic Inscribed Stones Project (CISP) is jointly run between the Department of History, UCL, and the Institute of Archaeology, UCL, under the direction of Prof. Wendy Davies in collaboration with Prof. James Graham-Campbell. The project currently (as of October 18, 2000) employs three full-time staff (Dr Kris Lockyear, Dr Mark Handley and Dr Paul Kershaw). The database structure described in this manual was constructed with by Dr Kris Lockyear and former research fellow Dr Katherine Forsyth. The first three years of the Project was funded by the HRB/HEFCE via their institutional fellowship scheme.

CISP's aim is to undertake a collaborative, interdisciplinary study of Early Medieval Celtic inscriptions. One of its main objectives is the compilation of a comprehensive and authoritative database of all known inscriptions from Great Britain, Ireland and Brittany. By bringing this material together in one place and making it readily available our goal is to turn what is a largely untapped resource into usable material.

Further details of the Project are available on the Project's web pages
(http://www.ucl.ac.uk/archaeology/cisp).

This guide and manual is intended both as a general introduction to the CISP database, and as a detailed guide for data entry. Chapter contains an introduction to databases, database management systems, and data structures (terms which are discussed below). The subsequent chapters discuss the contents of the CISP database, and provide a detailed table by table, field by field guide to the database including allowed terms and definitions of fields and entries, and a short guide to the CISP data entry application. Appendices provide a glossary of terms and list major changes to the database since the first version of this manual.

Database concepts

This section discusses a number of database concepts and is primarily intended for those who have had little or no experience of computer-based databases.

Databases

A database is structured collection of data. Thus, card indices, printed catalogues of archaeological artefacts and telephone directories are all examples of databases. Databases may be stored on a computer and examined using a program. These programs are often called `databases', but more strictly are database management systems (DMS). Just as a card index or catalogue has to be constructed carefully in order to be useful, so must a database on a computer. Similarly, just as there are many ways that a printed catalogue can be organised, there are many ways, or models, by which a computerised database may be organised. One of the most common and powerful models is the `relational' model (discussed below), and programs which use this model are known as relational database management systems (RDMS).

Computer-based databases are usually organised into one or more tables. A table stores data in a format similar to a published table and consists of a series of rows and columns. To carry the analogy further, just as a published table will have a title at the top of each column, so each column in a database table will have a name, often called a field name. The term field is often used instead of column. Each row in a table will represent one example of the type of object about which data has been collected. Table 1(a) (p. ) is a an example of a table from a database of English towns. Each row, in this case a town, is an entity, and each column represents an attribute of that entity. Thus, in this table `population' is an attribute of `town.'

One advantage of computer-based tables is that they can be presented on screen in a variety of orders, formats, or according to certain criteria, all the towns in Hertfordshire, or all towns with a cathedral.

Specific purpose vs. resource databases

Databases often fall into one of two broad categories. The first comprises specific purpose, limited databases. In academia, these often contain data gathered to perform a relatively limited rôle only in a particular project. The database may be intended to provide the researcher with a particular set of data, but have no particular function or rôle at the conclusion of the project. For example, Lockyear's Coin Hoards of the Roman Republic (CHRR) database included only data necessary for the project in hand [LockyearLockyear1996, chapter 5].

The second category comprises general purpose, resource databases. A good example of a resource database are county archaeological sites and monuments records (SMRs), or national monuments records (Hansen:1993). These databases are not project specific but are intended to be of use to a wide variety of users. Resource databases usually attempt to be comprehensive within their `domain of discourse', are maintained and updated, and are made available to interested parties. As these databases attempt to be comprehensive in order to accomodate unpredicted enquiries and research, they include a wide variety of data which in turn requires a complex `data structure', or way of storing the information.

The CISP database is intended to be a resource database and as a result has a complex data structure (discussed below). This structure, however, provides great power and flexibility both for the retrieval and for the handling of the data, but also for future expansion of the database to include other information and materials.

Relational databases

A common and powerful method for organising data for computerisation is the relational data model. Use of this model often results in a database with many tables, and a common question is why such a complex structure should be necessary. Table 1(b) is an example of bad table design with the same towns as in Table 1(a) but with some additional information--the population and the area of the counties--added. We can see from this table that the size and population of Hertfordshire is repeated three times. This duplication is called data redundency. Data redundency is a problem for several reasons:

  • It is a waste of time to enter the same data repeatedly.
  • It increases the possibilities of error. In Table 1(b) the population for Hertfordshire has been mis-typed in the third row.
  • Entry errors will create errors in data retrieval, which are likely to be less visible/predictable in complex queries.
  • It is a waste of disk space--this can be a major consideration with large databases.
  • It can slow down some queries on the database.
  • Updates or corrections have to be applied to multiple rows.
[A table of English towns]
towncountypopulationcounty town?cathedral?
Welwyn Garden CityHertfordshire40,570nono
St. AlbansHertfordshire123,800noyes
HertfordHertfordshire2,023yesno
DurhamDurham29,490yesyes
[A badly designed table]
towncountypopulationcounty town?cathedral?county populationcounty size
Welwyn Garden CityHertfordshire40,570nono937,300631
St. AlbansHertfordshire123,800noyes937,300631
HertfordHertfordshire2,023yesno397,300631
DurhamDurham29,490yesyes132,681295
 Essex   1,426,2001,528
[A table of counties]
countypopulationsize (square miles)  
Hertfordshire937,300631  
Durham132,681295  
Essex1,426,2001,528  

A second problem with the table can be seen in the last row. We have information about the population of Essex as a whole but none about any individual town. To accomodate this information we have had to create a row of data with only partial information. As well as these problems, a poor data structure can lead to inflexibility in the use of the database, and possibly problems in retrieving data in the form required. Examples of poor database design are all too common.

To solve these problems, the data should be split into several tables. To follow the town example through, we could have a table of towns as given in Table 1(a). Each item of information stored in this table is an attribute of a town. The information about counties is then stored in a second, separate table of counties as shown in Table 1(c). In this table every item of information is an attribute of a county. This process of breaking data down into a series of tables is called normalisation and is the first and most important step in designing a relational database.

Normalisation is the process of identifying entities and their attributes, and defining the relationship between the entities. In our example we have two entities--towns and counties, and we have recorded various attributes (Tables 1(a) & 1(c)). There are three types of relationship between entities: one-to-one, one-to-many, and many-to-many. Figure 1 shows the different types of relationship in a diagramatic form which are discussed in detail below. This type of diagram is known as an entity relationship diagram.


One-to-one relationships

This is where there is, for any one entity, only one example of another related entity. For example, if we had only collected data about county towns, there would be a one-to-one relationship between each entry (county) in the table of counties and a town in the table of county towns. This type of relationship is shown in Figure 1(a). It would be possible, although not really desirable, to store all the information in one table in this case.

A special case of a one-to-one relationship is where particular pieces of information only exist, or are only applicable, to some of the entries in a table. In our geographical example we may wish to record the length of coast line or other attributes which only relate to counties which border the sea. In these cases one can create a separate table for this information. This helps to save disk space on the computer, minimise data entry time, and break down potentially large tables. This type of relationship is shown in Figure 1(b).


One-to-many relationships

This is where there is, for any one entity, many examples of another entity. This is the relationship between the counties as shown in Table 1(c) and the towns in Table 1(a)--a town can only have one county but a county will have many towns. In these cases, the information about each entity must be stored in separate tables. This type of relationship is shown in Figure 1(c).


Many-to-many relationships

This is where an entity can have many examples of another entity but this second entity can also have many examples of the first. In our geographical example, we may want to store information about rivers. Any one county has many rivers, but similarly, a river is likely to flow through many counties. This type of relationship is illustrated in Figure 1(d).

This type of relationship necessitates the use of the third table. This effectively creates two one-to-many relationships. These intermediate tables can be called linking tables. These tables often only contain two columns which act as a link between the two main tables. In our geographical example, the linking table would contain the names of counties, and the names of rivers only. This solution to modelling many-to-many relationships is illustrated in Figure 1(e).

[A one-to-one relationship]

[A one-to-one relationship for some entities only]

[A one-to-many relationship]

[A many-to-many relationship]

[Splitting a many-to-many relationship into two one-to-many relationships]


Primary and foreign keys

Every row in a table in a relational database must be unique, there must not be two identical rows. One or more columns are therefore designated the primary key (sometimes called the unique identifier) for the items contained within it. Thus, in Table 1(a) the column `town' could act as the primary key, and in Table 1(c) column `county' can act as that table's primary key. This concept has been used in paper-based (published) databases, each inscribed stone catalogued in R. A. S. Macalister's Corpus Inscriptionum Insularum Celticarum Macalister:1945,Macalister:1949 has an unique identifying number, as does each hoard in Crawford's Roman Republican Coin Hoards Crawford:1969:b.

In our geographical example, however, there can be more than one town with the same name, Newcastle or Newport for example. In this case we could designate the `town' and the `county' columns together as the primary key.

Foreign keys are columns in a table which provide a link to another table. In our geographical example, the county column in our table of towns provides a link to the table of counties, and is thus a key field in that relationship. It is very important therefore to ensure that entries in the both tables are identical, that both tables use the full county name (Hertfordshire) or an abbreviation (Herts) but not a mixture of the two.

There is one final complexity which must be addressed. What could we do in the case where there are two towns with the same name in the same county? Although in our example it is unlikely, in databases of other information this could happen. We could use a combination of name, county and population as the primary key for the table of towns. If we had a table of shops, we would have to include the town name, county and the population to provide a link between the two tables. This, however, will re-introduce the problem of data redundancy. A better course of action is to assign a unique code to each town, and to use this code as the link to the table of shops. The use of codes has other advantages: it can be quite short and thus save time during data entry and disk space. These codes can be assigned by the user, WGC for Welwyn Garden City, or could be a sequential number created automatically by the program.


Data types and definition

The data stored in tables can be classified into types. In Table 1(a) the first column can contain any letter, number, or other character (such as {, or &). This is an alphanumeric data type, also known as a string or character field. The third column for population contains a number and is a numeric data type. The last two columns are `logical' and can only contain yes or no. There are other data types such as date or even images and sounds.

The type of data is important as different types of data behave in different ways. A good example is the sorting order of a series of numbers. If we store 1, 22, 3, 10, 2 and 15 in a numeric column, and ask the program to sort the rows of the table on this column, we will get 1, 2, 3, 10, 15, 22 as we might expect. If that column was defined as an alphanumeric data type, the result would be 1, 10, 15, 2, 22, 3, a rather different result! Different DMSs have different ways of handling different types of data (see below).

Each column of data also has to be defined. This can be quite simple, `the county column will contain the full county name'. We also have to decide what the entries mean, in the table of counties we have a column for area--we have to decide if this is the area in square miles or square kilometers.

We may wish to restrict the possible entries in a column. We can do this to prevent errors, we may decide that the maximum allowed population in a town is 10,000,000 as no town in Britain has a population larger than that. We may also wish to restrict entries to a limited list of terms. If, for example, we had `type' as an attribute of town, we could have market town, small town, county town, village, small village, hamlet and so on. If any term was allowable, this attribute would not be very useful for retrieving groups of settlements in any meaningful way. We might, therefore, create a list of allowed terms which are precisely defined and which would therefore allow meaningful data retrieval.


Look-up tables

In the previous section restricted data fields were discussed. How, in practice, are the entries in fields to be restricted? The first method is for the allowed terms to be listed in a manual such as this one, and for every user to be disciplined enough only to use those terms, and to check that they have used the correct ones. There are advantages, however, in storing these terms on the computer along with the main tables of data. There are thus two further methods. The first is to include the definitions in a database application (see below), or in the way the table is defined within the DMS. This has the disadvantage that the information is dependent on the software being used, and if the data is transferred (`ported') to another program this information will be lost. It is also difficult to add new terms to the list. The second alternative method is to use look-up tables, of which there are two types, simple and hierarchical.

Simple look-up tables typically consist of one or two columns. In a one column example, the list of allowed terms is stored in the table; in a two column example the first column stores the allowed term, often in the form of a code, and the second column stores the definition of that term or code. A good example of simple look-up tables are the POSIT1, POSIT2 and POSIT3 tables discussed on page .

Hierarchical look-up tables are very similar in that one column contains a series of unique terms or codes. The remaining columns then contain definitions of that code, but in different levels of detail. Using our geographical example, we might wish to classify the rivers. The look-up table would contain a column of codes. Another column could then contain some broad classification such as `major river', `minor river' and `stream.' A third column could then further subdivide the classification, major rivers might be divided into `tidal' and `non-tidal', and a fourth column could divide `tidal' into `estuarine' and `non-estuarine'. The SITETYPE table discussed in section 10.6 is a good example of a hierarchical look-up table.

Hierarchical look-up tables have a dual function--to restrict the entries in a second table (sometimes called a parent table), and to provide a mechanism by which complex queries can be simplified. Both types of look-up table can be used to create printed output from the database which is more readily understood, by replacing a series of possibly obscure codes with more descriptive pieces of text.

Database applications

Relational database management systems (RDMS) will typically provide a series of tools for creating tables, conducting searches, producing printed reports, With a complicated database, however, it is usual for a database application to be written. A database application is a usually a program within a program, it is a program that runs inside the RDMS. Most, if not all RDMSs, provide an `application development language.' This will allow a computer programmer to create an application to perform specific tasks for a particular database, most commonly to provide a simpler and more efficient method of inputting data to the database, and for checking for errors. Often this will use a series of forms with menus and buttons.

Conclusions

This chapter has provided an overview of the concept of databases, and has presented detail relating to the concept of relational databases, their structure and requirements. For those wishing to go further the database Bible remains Date's An Introduction to Database Systems [DateDate1995]; Carter:1992 provides a less comprehensive but perhaps more comprehensible account for non-specialists.

The following chapters examine the content and structure of the CISP database in general, and then provide a data definition guide to all tables and fields.

Mike Gahan 2000-10-18

One thought on “Many-To-Many Relationship Definition Essay

Leave a comment

L'indirizzo email non verrà pubblicato. I campi obbligatori sono contrassegnati *