What makes a great database design

5 database design.


1 5 Database design At the beginning of every database application is the design of the database. It has a major impact on how efficient the application becomes, how simple or complicated programming and maintenance become, and how flexible the solution is in the event of any change requests. Errors that occur in the design phase can only be corrected later with a great deal of effort. This chapter goes into the basics of relational databases, summarizes the data and table types available for selection under MySQL, gives concrete examples of MySQL database schemes, shows how indexes enable more efficient table access, and finally explains how databases and their tables specifically be generated. 5.1 Introduction to database theory MySQL data types, indices MySQL table types, creating databases, tables and indices Example mylibrary (library) Example myforum (discussion forum) Example exceptions (special cases) 196

2 140 5 Database Design 5.1 Introduction Correct database design is undoubtedly one of the greatest challenges in the development of complex database applications. This chapter tries to give a first introduction. But don't expect any magic bullets! Database design has a lot to do with practical experience, and that cannot be adequately conveyed in a single chapter. So that you don't lose your bearings in the relatively long chapter, here is a first overview: The two following sections convey basic knowledge. On the one hand, it is about how data can be logically distributed over several tables. The ultimate goal is to avoid redundancy. The so-called normalization process helps with this. On the other hand, the data types available under MySQL are presented. For example, the section answers the question of which data type is best suited for storing amounts of money, long and short strings, etc. Next, the chapter looks at some advanced topics: How can indexes speed up the process of finding and sorting tables? What types of tables does MySQL know about and when is it recommended to use them? How are databases and their tables actually created? The next section introduces various CREATE commands that help with this. (Creating new tables is much more comfortable if you use a user interface such as phpmyadmin instead!) All theory is gray! That is why the final sections give concrete examples for the design of several databases for the administration of a small library, for the administration of a discussion forum etc. REFERENCE In this chapter it is not possible to avoid introducing SQL commands every now and then, although SQL actually only in the next chapter is presented in detail. (Conversely, it is also impossible to describe SQL without assuming a specific database layout.) So you may sometimes have to flip between this and the next chapter. Further Reading There are countless books that deal exclusively with database design and SQL, regardless of specific database systems. Of course they go

3 5.2 Database Theory 141 Opinions differ as to which books are good. Just consider the following recommendations as my personal hit list. Joe Celko: SQL for Smarties, Morgan Kaufmann Publishers, (Not for SQL beginners; many examples are currently incomprehensible in MySQL because MySQL is not yet sufficiently ANSI-SQL / 92-compatible; nevertheless a great, example-oriented SQL book with Level.) Judith S. Bowman et al: The Practical SQL Handbook, Addison-Wesley, (new edition planned for 2001.) Michael J. Hernandez: Database Design for Mere Mortals, Addison-Wesley, (The first half is lengthy, but the second Half excellent, easy to understand.) The MySQL manual recommends another book as an introduction to SQL (which I don't know personally): Martin Gruber: Mastering SQL, Sybex, If you don't want to spend any money on another book and just want to If you are interested in database design, the rather compact introduction to the design of relational databases by Fernando Lozano may be enough: Database theory, normal forms Book authors only ever think of books! So who is surprised that the example in this section is also about books? The aim of the example is to create a small database in which data about books is stored: title of the book, publisher, authors, date of publication etc. This data can of course be stored without a database in a simple list in text format, like you you can find it at the end of this book with the references: Michael Kofler: Linux. Addison-Wesley Michael Kofler, David Kramer: Definitive Guide to Excel VBA. apress Robert Orfali, Dan Harkey, Jeri Edwards: Client / Server Survival Guide. Addison Wesley Tobias Ratschiller, Till Gerken: Web Application Development with PHP 4.0. New Riders 2000.

4 142 5 Database Design In itself, this list contains all of the information required. Why should you even bother converting the text (which may be a Word document) into a database? Well, there are a lot of reasons: while the list makes it easy to search, it is not possible to organize the data in any other way: for example to create a partial list of all books by author x or to create a new list that does not follow Author names, but sorted by title. The First Attempt So think to yourself: Nothing is easier than converting this list into a database table. (In the following, titles and author names are abbreviated for reasons of space.) Title publisher year author1 author2 author3 Linux Addison-Wesley 2000 Kofler M. Definitive Guide ... apress 2000 Kofler M. Kramer D. Client / Server ... Addison-Wesley 1997 Orfali R. Harkey D. Edwards E. Web Application ... New Riders 2000 Ratschiller T. Gerken T. Figure 5.1: Book database, first attempt This table is obviously problematic: Even at first glance, it is noticeable that the limitation to three authors is arbitrary is. What do you do with a book written by four or five authors? Simply insert more and more authorn columns, most of which will then always remain empty? The first normal form Fortunately, the database theorists have found a recipe for such problems: Apply the rules for the three so-called normal forms to your database one after the other! The rules for the first normal form are (translated from the language of the theorists into understandable German): Columns with similar content must be eliminated. A separate table must be created for each group of related data. Each data record must be clearly identifiable by a so-called primary key. The first rule in this example obviously relates to the authorn columns.

5 5.2 Database Theory 143 The second rule does not seem to apply here. In this example, it is only about data that belongs specifically to the respective book. One table is therefore sufficient. (It will soon be found out that this is a mistake.) The third rule means in practice that a consecutive and above all unique number must be introduced that identifies each line. (Strictly speaking, it is not necessary that a number is used as the primary key. Formally, only uniqueness is important. For reasons of efficiency, however, the primary key should also be as small as possible. An integer is therefore much more suitable than a character string of variable length.) A redesign the table according to the first and third rule gives the following result: id title publisher year author 1 Linux Addison-Wesley 2000 Kofler M. 2 Definitive Guide ... apress 2000 Kofler M. 3 Definitive Guide ... apress 2000 Kramer D. 4 Client / Server ... Addison-Wesley 1997 Orfali R. 5 Client / Server ... Addison-Wesley 1997 Harkey D. 6 Client / Server ... Addison-Wesley 1997 Edwards E. 7 Web Application ... New Riders 2000 Ratschiller T. 8 Web Application ... New Riders 2000 Gerken T. Fig. 5.2: Book database, first normal form It is easy to see that there are no longer any problems with the number of authors. No matter how many authors a book has, it can be stored in this table. The price for this is high, however: the contents of the title, publisher and year columns are repeated with each author. That cannot be the last word in wisdom! Second normal form The rules for the second normal form are as follows: Whenever content is repeated in columns, the tables must be broken down into several sub-tables. The tables must be linked to one another using so-called foreign keys. If you are new to the database language, foreign key is probably a strange concept to you. Colloquially, it would be better to translate foreign keys with cross-references: It is simply a reference to a line in another table.

6 144 5 database design belle. For programmers, the designation pointer would be obvious, in Internet language one could also speak of a link. In Figure 5.2, data is repeated in almost all columns. The reason for this obviously lies in the authors' column. The first attempt to outsource the authors to a separate table leads to the result that you can see in Figure 5.3 and Figure 5.4. title table titleid title publisher year 1 Linux Addison-Wesley Definitive Guide ... apress Client / Server ... Addison-Wesley Web Application ... New Riders 2000 Figure 5.3: title table, second normal form author table authorid titleid author 1 1 Kofler M. 2 2 Kofler M. 3 2 Kramer D. 4 3 Orfali R. 5 3 Harkey D. 6 3 Edwards E. 7 4 Ratschiller T. 8 4 Gerken T. Figure 5.4: author table, second normal form at the author table uses the first column with the consecutively numbered authorid values ​​as the primary key. The second column does the job of the foreign key: it refers to rows in the title table. For example, line 7 of the author table means that Ratschiller T. is an author of the book with titleid = 4 (i.e. the book Web Application ...). Second normal form, second attempt The result is not yet completely optimal. Kofler M. appears twice in the author table. The more books are inserted into this database, the more often there will be such redundancies whenever an author has worked on more than one book. The only possible solution to this problem is to split up the authors table again and do without the titleid column.

7 5.2 Database theory 145 which book belongs to which author must be indicated in a third table. The three tables then look like in Figure 5.5 to Figure 5.7. title table titleid title publisher year 1 Linux Addison-Wesley Definitive Guide ... apress Client / Server ... Addison-Wesley Web Application ... New Riders 2000 Figure 5.5: title table, second normal form author table authorid author 1 Kofler M. 2 Kramer D. 3 Orfali R. 4 Harkey D. 5 Edwards E. 6 Ratschiller T. 7 Gerken T. Figure 5.6: author table, second normal form rel_title_author-table titleid authorid Figure 5.7: rel_title_author-table, second normal form This step is undoubtedly the most difficult and probably the most abstract because a table of the form rel_title_author does not correspond to any human approach. Rel_title_author would be completely unsuitable for manual data management. But computers aren't human!

8 146 5 Database design It is no problem for the computer to merge the data (at least not as long as a program like MySQL helps it!). Suppose you wanted a list of all the authors of the Client / Server ... book. MySQL would then first look in the title table to see which titleid number this book has. Then it would look in the rel_title_author table for records containing this number. The associated authorid numbers then lead to the authors. NOTE You may be wondering why there is no separate ID column in the rel_title_author table, e.g. B. rel_title_author_id. Such a column is usually not used because the combination of titleid and authorid is already an optimal primary key. (Relational database systems also allow primary keys that are composed of several columns.) Third normal form The only rule of the third normal form is: Columns that are not directly dependent on the primary key of a table must be eliminated (i.e. moved to a separate table ). In this example, this affects the publisher column in the title table. The list of publishers and the list of book titles are independent information units and should therefore be kept separate. Of course, the publisher in which it was published should continue to be noted for each title, but the entire publisher does not have to be named every time, a foreign key (i.e. a reference) is sufficient. title table titleid title publisherid year 1 Linux Definitive Guide Client / Server Web Application Figure 5.8: title table, third normal form publisher table publisherid publisher 1 Addison-Wesley 2 apress 3 New Riders Figure 5.9: publisher table, third normal form

9 5.2 Database Theory 147 The tables author and rel_title_author no longer change in the third normal form. The entire book database now consists of four tables (see Figure 5.10). Figure 5.10: The schema of the book database If the rules of the first normal form were studied more closely (data belonging together belong in separate tables), some intermediate steps could of course have been saved, but that would have reduced the didactic value of the example. In fact, in practice it is often the case that it is only when inserting test data and based on the redundancies that arise that you notice where tables have to be broken down. Naming In this example, an attempt was made to leave field names unchanged from the beginning (without normalization) to the end in order not to increase confusion. The result is not quite optimal, and it would be worth considering to revise the entire table design now with regard to the naming. The following points provide some information on naming tables and columns (fields). MySQL differentiates between upper and lower case in tables, but does not distinguish between columns (see also Chapter 13). In this respect, it is important, at least for tables, to use the same upper and lower case. Of course, field and table names should be as clear as possible. Field names like author or publisher are not particularly good examples here. First, they match the table names, which can be confusing, and second, name or authorsname or companyname would actually be more precise. A uniform scheme for the naming of fields saves a lot of careless mistakes: Regardless of whether you are more likeable to authors_name or authorsname, stay consistent. You should also think about how to deal with singular and plural: In this example, I could have named the tables title, author and publisher titles, authors and publishers. There is no rule here as to what is right, however

10 148 5 Database design it is definitely confusing when half of the tables use the singular and the other half use the plural. Finally, it is worth considering whether or not field names should also provide information about the table. In this example there are fields such as titleid, publisherid etc. As far as primary keys are concerned, these fields could simply have been called id. This is sufficiently clear because the table name often has to be specified in SELECT queries that include several tables. (SQL uses the notation table-name.field-name for this.) This then leads to rather cumbersome statements such as WHERE publisher.publisherid = title.publisherid. On the other hand, there are also foreign keys in many tables and the specification of the table name is inevitable there. (You cannot designate two fields with id in the title table!) That would then lead to publisher.id = title.publisherid, which is also not ideal. (As you can see, there are no patent remedies.) REFERENCE Another sample database for this book has emerged from books, namely mylibrary. The design of this database is described in Section 5.7. In terms of naming, the database is more optimal than books. mylibrary is the basis for many of the examples in this book. More theory ... The three normal forms for relational databases were first formulated by the scientist E.F.Codd. They still represent the basis for a separate branch of research that deals with the formal description of mathematical sets in general and of relational databases in particular. Depending on the literature, up to three more are added to the three normal forms described here, but these are of subordinate importance in practice. The normal forms and their rules are described much more precisely than here. However, such descriptions are teeming with entities, attributes, etc. that the connection with relational databases is easily lost. If you are interested in more details, please take a look at a good book on database design (see also the notes at the beginning of this chapter). Less theory ... I have tried to present the first three normal forms here as simply and example-oriented as possible, but perhaps you were still too theoretical.

11 5.2 Database Theory 149 table.In fact, the normal forms are of limited use, especially for database beginners, because the correct interpretation of the rules is often difficult. Here are a few rules to get you started: Take your time with the database design. (If you have to change the database schema at a later point in time when the database is already filled with real data and there is already client code, the time required for this is even greater!) Avoid columns with enumerations (name1, name2 or object1 , object2 etc.). There is certainly a better solution here with a second table. Immediately fill your database with some test data. Try to cover as many special cases as possible. If there is redundancy, i.e. if the same content occurs several times within a column, this is almost always a wink with the fence post! Break the table down into two (or more) new tables. Try to understand the concept of relations (see also the following section). A good database design is impossible until you have some experience with SQL (see also the next chapter). Only when you know and understand the query options of SQL can you assess the consequences of organizing the data. Use sample databases as a guide (in this book or in other database books). TIP You can find a nice example for the normalization of a database at the following place on the Internet: Normal forms Pros and Cons Normal forms are a means to an end, no more and no less! Normal forms are intended to aid in the design of databases, but they cannot replace common sense. In addition, it does not always make sense to slavishly follow normal forms, i.e. to really dissolve any redundancy. Cons: Entering new data in web forms, for example, is more complicated the more tables there are to which the entries have to be distributed. This applies equally to the end user (who is usually guided from one page to the next) and the programmer.

12 150 5 Database Design For reasons of query efficiency, it is sometimes better to accept a bit of redundancy. Merging data from several tables is in most cases slower than reading the data from a single table. This is especially true for databases that change little, but for which complex queries are often performed. (In a database specialty, the so-called data warehouses, redundancy is often consciously accepted in order to achieve higher speed. The goal of data warehouses is to analyze complex data according to different criteria Tasks, which is why the peculiarities of this special application are not discussed further here.) Pro: Redundancy is usually a waste of storage space. You may be of the opinion that this is indifferent in the age of 100 GB hard drives, but a large database inevitably becomes a slow database (at the latest as soon as the database size exceeds that of the RAM). As a rule, databases in normal form offer much more flexible query options. (Unfortunately, this is almost always noticeable when a new form of data query or grouping is required, often after the database has been in operation for months.) Relations As soon as you convert a database into normal forms, you have to link several tables together. These links are called relations in the database language. Basically there are three possible relationships between two tables: 1: 1 Unique relationship between two tables: Each data record in one table corresponds to exactly one data record in the other table. Such relationships are rare because the information from both tables could then also be stored in a single table. 1: n A data record in the first table can appear in several data records in the second table (e.g. a seller with several orders). Conversely, no ambiguity is possible, an order can only be carried out by one seller. Occasionally, there is also talk of an many-to-one relationship, which, however, is identical to a one-to-many relationship (only the direction of view is different). n: m A data record in one table can appear in several data records in the other table and vice versa (e.g. several different articles in one order, an article in several different orders, books and their authors).

13 5.2 Database Theory 151 1: 1 Relationships A 1: 1 relationship is most likely used in practice when a table is split into two parts that use a common primary key. The easiest way to understand this is with an example: A table with information on the personnel of a company contains a great deal of information: name, department, date of birth, date of employment, salary, etc. This table could now be broken down into personal and personal_extra, whereby personal is often required and generally accessible information, personal_extra contains additional, less frequently used and more private data. There can be two reasons for such a separation. One is the security aspect, it is easy to protect the personal_extra table from general access. (For MySQL, this argument is only valid to a limited extent, because the access rights can be set separately for each individual column of a table anyway.) The other reason is speed: If a table contains a large number of columns, but only a few of them for most queries are needed, it is more efficient to summarize these columns in a table. (Ideally, the first table contains only columns of a specified size. Such tables are more efficient to manage than tables that also contain columns of variable size. For more information, see Section 5.5, which gives an overview of the table types supported by MySQL.) The main disadvantage of separating the tables is that you have to constantly ensure that the tables are in sync. 1: n relations 1: n relations are always used when a certain field of a data record can take on a value in another table. The first table is sometimes referred to as the master table and the second as the detail table. The link is made using key fields. The detail table has a primary key for this purpose. In the master table, a foreign key field contains references to the data records in the detail table. Examples: Normalization example of the previous section (book database): Here there is a 1: n relation between the title and the publisher table. Each book title can be assigned to one of n possible publishers. So the title table is the master table and the publisher table is the detail table. publisher.publisherid is the primary key of the publisher table, title.publisherid is the foreign key of the title table (see Figure 5.10). Business application, table with orders: The table contains data on all orders received. It is the master table. A possible

14 152 5 Database design The detailed table for this would be the customer table. With every order, a foreign key refers to the customer who placed the order. Discussion forum, table with messages: The table contains the data for each discussion item available on the website (title, text, date, author, group, etc.). Two possible detailed tables would be the group table with the list of all discussion groups and the author table with the list of all website members who are allowed to post. CD database, table with CDs: The table contains the data of each stored CD (title, band name, number of CDs, etc.). Two matching detailed tables would be the band table with a list of all music groups represented in the database and a label table with a list of the record labels. NOTE Sometimes when designing a database an attempt is made to give the same names to fields in two tables that are later to be linked by a relation. This promotes clarity, but is by no means a requirement. Table diagrams in this book (as in Figure 5.10) were created with Microsoft Access. Despite all the technical disadvantages that Access has compared to MySQL due to the file server architecture, the user interface is efficient to use. There is hardly a program with which you can draw relation diagrams so quickly. However, Access has the peculiarity of designating 1: n diagrams with 1:. 1 denotes the detail table, the master table. Sometimes an arrow is used instead of this nomenclature. In principle, it is also possible for the primary key and the foreign key to be in the same table. Then a record in a table refers to another record in the same table. This is always useful when hierarchies are to be displayed. Here are a few examples: Personnel table: For every employee (apart from the boss), a field refers to the higher-level person (the group leader, department head, etc.). Discussion forum, table with messages: For each message, a field refers to the higher-level message (which was replied to here). Music database, table with musical styles: For each style, a field refers to the superordinate group (e.g. bebop as a style within the jazz category, as well as emo as a style within the punk category; see

15 5.2 Database theory 153 NOTE References within a table allow hierarchies to be saved very easily, but the evaluation of such data is usually less easy. For example, the query command SELECT does not provide the possibility of recursion, which is often required for the analysis of hierarchical relationships. So let yourself go through the query options before you carelessly build complex hierarchies on the basis of internal table references. The mylibrary database presented in Section 5.7 gives a concrete example for the representation of hierarchies. n: m relations For n: m relations, an additional table is required between the two output tables, via which the n: m relation is reduced to two 1: n relations. Examples: Normalization example (book database): Here there is an many-to-many relationship between book titles and authors. The relation is established via the rel_title_author table. Business application, table with orders: In order to establish a relationship between the order and the items ordered, the table in between specifies how many pieces of item x are to be delivered with order y. University administration, examination lists: In order to save which student completed which examination (when and with which grade), a table is required, which is located between the student table and the examination table. Primary and foreign key relations are directly related to primary and foreign keys. This section provides summary information on these two terms and their application. Unfortunately, it cannot be completely avoided that this section anticipates SQL commands that are only introduced at the end of this chapter or in the next chapter. Primary key The primary key is to find a specific data record in the table as quickly as possible (e.g. the data record with id = off

16 154 5 database design of one million records). This operation has to be performed every time data from several tables is merged, so very often! Most database systems, including MySQL, also allow primary keys that are composed of several fields in a table. Regardless of whether one or more fields are used as the primary key, the following properties should apply: The primary key must be unique. It must be ruled out that two data records have the same content in the primary key field. The primary key field should be compact. There are two reasons for this: Firstly, an index (the primary index) must be managed for the primary key field so that the search for data records (id =) takes place as quickly as possible. The administration of this index is more efficient, the more compact the primary key field is. Therefore, an integer number is more suitable as a primary key field than a character string of variable length. Second, the content of the primary key field is used as a foreign key in other tables, and there, too, it is efficient if the foreign key is as compact as possible. (Relationships between tables are introduced not least to avoid wasting space through possible redundancies. Of course, this only makes sense if the management of key fields does not take up more space!) In most database systems, it has become established as a primary key field to use a 32- or 64-bit integer that is automatically generated by the database system as a consecutive number (1, 2, 3 ...). This means that neither programmers nor users need to worry about how a new and unique primary key value can be found for each new data record. In MySQL, such fields are declared as follows: CREATE TABLE publisher (publisherid INT NOT NULL AUTO_INCREMENT, othercolumns ..., PRIMARY KEY (publisherid)) Translated from SQL into German this means: The publisherid field must not contain NULL. Its content is generated by the database (if no other value is explicitly specified when inserting). The field acts as a primary key, i.e. MySQL creates an additional index file to enable a quick search. At the same time, this ensures that the publisherid value is unique when new data records are inserted (even if a certain publisherid value is specified in the INSERT command). For tables in which a large number of entries or changes are to be expected, BIGINT (64 bit) should usually be used instead of INT (32 bit).

17 5.2 Database theory 155 NOTE Incidentally, the name of the primary key field is irrelevant. Most of the times in this book id or tablenameid is used. Often you will also find combinations with no or no, such as customer no or customerno. Foreign key The role of a foreign key field is to refer to a record in the details table. However, this reference only comes about through the formulation of a database query, for example in the following form: SELECT title.title, publisher.publisher FROM title, publisher WHERE title.publisherid = publisher.publisherid ORDER BY title This creates an alphabetical list of all book titles, in which the publisher in which the book was published is indicated in the second column. The result would look like this: title Client / Server ... Definitive Guide ... Linux Web Application ... publisher Addison-Wesley apress Addison-Wesley New Riders The key point in the query is the WHERE title.publisherid = publisher.publisherid clause. This establishes the connection between the two tables. When declaring a table, however, the foreign key field does not play a special role. For MySQL, a foreign key field is a very common table field. No special keywords are required. In particular, no index is required (the content of the foreign key is almost never searched for). Of course, you cannot specify the AUTO_INCREMENT attribute here either. You want to specify yourself which data record the field refers to! You should only make sure that the foreign key field is declared in the same data type as the primary key field, otherwise the evaluation of the WHERE condition will be very slow. CREATE TABLE title (othercolumns ..., publisherid INT NOT NULL) Whether you specify the NOT NULL attribute depends on the context. In most cases, NOT NULL is recommended in order to avoid incomplete entries from the outset. But if you want to allow in the book database, for example,

18 156 5 Database design you can do without NOT NULL so that books can also be saved without a publisher's name. Referential Integrity If you delete the author Kofler from the author table of the normalization example database (see Figure 5.6 to Figure 5.10), you will run into problems with many SQL queries that access the books Linux and Definitive Guide. The authorid number 1 specified in the rel_title_author table no longer exists in the author table. In the database language it would be formulated like this: The referential integrity of the database is violated. As the database developer, you are responsible for ensuring that this cannot happen. Before deleting a data record, you must therefore always check whether there is a reference to this data record in another table. Since programmers cannot always be relied on (and databases sometimes have to be changed manually), many databases have rules for maintaining referential integrity. With every change in the database, such rules test whether cross-references between tables are affected. Depending on the declaration of the foreign key, there are two possible consequences: Either the operation in question is simply not carried out (error message), or all data records in the dependent tables are deleted at the same time. Which approach is more economical depends on the content of the data. This question does not currently arise under MySQL because MySQL does not yet care about referential integrity.It is to be hoped that this function will be added later. For this reason (but also to achieve extensive ANSI compatibility) the corresponding SQL keywords are already provided. So if MySQL already masters referential integrity when you hold this book in your hands, you can probably declare foreign key fields like this: CREATE TABLE title (column1, column2, ..., publisherid INT NOT NULL, FOREIGN KEY (publisherid) REFERENCES publisher ( publisherid) [options]) This means that title.publisherid is a primary key that points to publisher.publisherID. Possible options would be RESTRICT or ON DELETE CASCADE. The options therefore specify how the database system should behave in the event of a possible violation of referential integrity.

19 5.3 MySQL data types 157 Incidentally, the statement is already being executed without errors; however, the FOREIGN-KEY information is ignored and is not even saved at the moment. (At least the second is expected to change soon: Even if MySQL does not care about compliance with referential integrity, the corresponding table declarations should be retained. This would be particularly useful for programs for formulating SQL queries and for graphical user interfaces, who would then know how the tables are linked.) Other Indexes LOOKUP The primary index inextricably linked with the primary key is a special case of an index. The purpose of indexes is always to speed up the search and sorting of tables according to the data in different columns. Other indexes supported by MySQL are presented in Section 5.4. 5.3 MySQL data types Now that the primary aim has been to distribute the data as sensibly as possible over several tables, this section goes one level deeper: Here you will find information about the data types that MySQL provides for each field (for each column). Should a character string be stored in a fixed or variable length field? Should floating point or fixed point numbers be used to store amounts of money? What do you do with binary objects? The section also partly deals with the attributes that can be specified when defining columns in order to additionally describe a data type. REFERENCE Although this section describes the data types, it does not describe the CREATE TABLE command with which tables are actually created. Information on how tables are actually created, i.e. how the planned database design is implemented in reality, can be found in Section 5.6. Integer numbers: xxxint (m) TINYINT SMALLINT MEDIUMINT INT, INTEGER BIGINT 8-bit integer (1 byte) 16-bit integer (2 bytes) 24-bit integer (3 bytes) 32-bit integer (4 bytes ) 64-bit integer (8 bytes)

20 158 5 Database design Both positive and negative numbers are normally allowed for the INT data types. With the UNSIGNED attribute, the range of numbers can be restricted to positive numbers only. With TINYINT, numbers between -128 and +127 are allowed. With the UNSIGNED attribute, on the other hand, the number range changes to When you try to store numbers that are outside the permitted range, MySQL simply replaces the values ​​with the largest or smallest permitted number. Optionally, the desired column width (number of digits) can be specified when defining an integer field, e.g. B. INT (4). The parameter is referred to in the literature as M (maximum display size). It helps MySQL and various user interfaces with the clear formatting of query results. NOTE Please note that M for the INT data types does not restrict the permitted number range or the permitted number of digits. Despite INT (4), you can, for example, store numbers greater than 9999. However, in rare cases (for example in the case of complicated queries, for the evaluation of which MySQL automatically creates temporary tables) it can happen that numerical values ​​in the temporary tables are truncated and thus incorrect results are obtained. The AUTO_INCREMENT attribute ensures that when a new data record is created for this field, MySQL automatically inserts a number that is one greater than the currently largest value in this column. However, the attribute is only permitted if the PRIMARY KEY or UNIQUE attribute is used at the same time. This also shows the typical use of the AUTO_INCREMENT attribute: It is mostly used when defining fields that are to serve as the primary key of a table. The automatic generation of an ID value only works if no specific value or NULL is specified for the field when inserting a new data record. However, it is possible to create a new data record with a very specific ID value (if the value has not yet been assigned). Floating point numbers: FLOAT (M, D) and DOUBLE (M, D) FLOAT DOUBLE REAL Floating point number, 8 digits precision (4 bytes) Floating point number, 16 digits precision (8 bytes) Synonym for DOUBLE Since version MySQL 3.23, the types FLOAT and DOUBLE correspond the single and double precision IEEE number types available in many programming languages.

21 5.3 MySQL data types 159 The number of digits for FLOAT and DOUBLE can optionally be set using the two parameters M and D. In this case, M indicates the number of digits before the decimal point, D those after the decimal point. M only helps with the formatting of the numbers, but does not limit the range of numbers. D, on the other hand, rounds the numbers when saving. If you try to store the number in a field with DOUBLE (6,3), it will actually be stored. NOTE MySQL expects floating point numbers in international notation, i.e. with a decimal point (no comma!). For this reason, decimal numbers are also given in this book with a point. Results of queries are also provided in this notation, very large or very small numbers also in scientific notation (e.g. e + 017). If you want to format floating point numbers differently, you must either use the FORMAT function in the SQL query (this function only helps with grouping thousands of numbers) or you must format them in the client programming language (i.e. in PHP, Perl, etc.) .). Fixed-point numbers: DECIMAL (P, S) DECIMAL (p, s) NUMERIC, DEC Fixed-point number, is saved as a character string; Any number of digits (one byte per digit + 2 byte administration) Synonyms for DECIMAL The number type DECIMAL is always recommended if rounding errors due to the internal binary representation of numbers in FLOAT or DOUBLE are to be excluded, for example with monetary amounts. Since the numbers are stored as strings, the memory requirements are much larger. At the same time, however, the possible range of values ​​is significantly smaller because no exponential notation is provided. The two parameters P and S indicate the total number of digits (precision) or the number of digits after the decimal point (scale). The number range for DECIMAL (6,3) extends from to This strange range results from the fact that obviously six digits plus one character are reserved for the negative sign. In the case of positive numbers, the digit for the sign is also used to store a digit. If P and S are not specified, MySQL automatically uses (10,0) so that positive numbers can be stored with eleven digits and negative numbers with ten digits.

22 160 5 Database design date and time: DATE, TIME and DATETIME DATE date in the form '', range up to (3 bytes) TIME time in the form '23: 59: 59 ', range +/- 838: 59: 59 ( 3 bytes) DATETIME Combination of DATE and time in the form ': 59: 59' (8 bytes) YEAR year (1 byte) Only a partial validation takes place for the DATE and DATETIME data types. Months between 0 and 12 and days between 0 and 31 are generally allowed. It is the responsibility of the client program to transfer correct data. (0 is allowed as a month or day number, in order to give the possibility to also save incomplete or partially unknown data.) The obvious question is why DATETIME takes up eight bytes, although DATE and TIME each get by with three bytes remained a mystery. MySQL delivers results of queries in the form. With INSERT or UPDATE operations, MySQL can also handle other formats as long as the order year / month / day is adhered to and the information is given numerically. If the year is given as two digits, the following rules of interpretation apply: 70-99:: If query results are to be specially formatted, the various MySQL functions for processing date and time values ​​help. The most flexible is DATE_FORMAT, the application of which is demonstrated in the following example: SELECT DATE_FORMAT (birthdate, '% Y% M% e') FROM students 2000 September October 25 .. Time of the last change: TIMESTAMP (M) TIMESTAMP Date and time in the Form for times between 1970 and 2038 (4 bytes) Among the data types for date and time, TIMESTAMP has a special role. Fields of this type are automatically updated each time the data record is changed and thus reflect the time of the last change. TIME-STAMP fields are therefore usually only used as an aid for internal administration, but not for storing your own data (although this is also possible).

23 5.3 MySQL Data Types 161 In order for the automatic TIMESTAMP update to work, either no explicit value at all is assigned to the field or it is assigned NULL. In both cases, MySQL uses the current time itself. When declaring a TIMESTAMP column, the desired column width M can be specified. The TIMESTAMP values ​​resulting from the queries are then cut off accordingly (with M = 8, for example, at a date without a time). Internally, however, the values ​​are still saved with full accuracy. NOTE Some database operations with certain client libraries (for example with MyODBY) only work if each table in the database has a TIMESTAMP column. The time of the last change is therefore often required for the internal administration of the data. Character strings CHAR (n) VARCHAR (n) TINYTEXT TEXT MEDIUMTEXT LONGTEXT Character string with a specified length, maximum 255 characters (n bytes) Character string with variable length, maximum n characters (n <256) Memory requirement: one byte per character (actual length) + 1 Character string with variable length, maximum 255 characters (n + 1 byte) Character string with variable length, maximum characters (n + 2 bytes) Character string with variable length, maximum characters (n + 3 bytes) Character string with variable length, maximum characters (n + 4 bytes) With CHAR, the length of the character string is fixed. CHAR (20) takes up 20 bytes in each data record, regardless of how long the character string actually stored is. (Spaces at the beginning of the string are eliminated before saving. Strings that are too short are lengthened at the end with spaces. These spaces are automatically removed when the data is read out, so that it is not possible to save spaces at the end of a string.) In contrast to this, the length of the character string with VARCHAR and the four TEXT types is variable. The memory requirement depends on the actual length of the character string. Although VARCHAR and TINYTEXT, which can both hold up to 255 characters, look the same at first glance, there are several differences: The maximum number of characters in VARCHAR columns must be specified when the table is declared (in the range 0 to 255). Strings that are too long are cut off when saving without an error message. In contrast to this, no length limitation is possible for xxxtext columns. (The only limitation is the maximum length of each type of text.)

24 162 5 Database design In VARCHAR columns, as in CHAR columns, blanks at the beginning of the character string are removed. (This behavior is only ANSI-compliant for CHAR columns, but not for VARCHAR columns. This behavior may change in future versions of MySQL.) With xxxtext columns, character strings are stored exactly as they are specified. In MySQL 3.22, only VARCHAR columns can be indexed (but not xxx-TEXT columns). This limitation was corrected in MySQL 3.23. TEXT columns can optionally be given the attribute BINARY. They then essentially behave like BLOB columns (see below). The BINARY attribute can even be useful if you are saving text (and not binary objects): This means that only the binary code of the characters is taken into account when sorting (but not any sorting tables). Likewise, when comparing, a distinction is made between upper and lower case (which is otherwise not the case). Basically, the internal handling of binary character strings is simpler and therefore faster than normal character strings. NOTE MySQL currently only supports strings for 8-bit character sets. Different sorting tables are available for such character sets, which can be selected when compiling MySQL and when starting the server. Unicode strings or other character sets in which characters are encoded with multiple bytes are not currently supported directly. However, there are sorting tables for some multi-byte character sets (see the MySQL Server Functions chapter in the MySQL manual). Unicode support is planned for future MySQL versions. Binary data TINYBLOB BLOB MEDIUMBLOB LONGBLOB binary data with variable length, maximum 255 bytes binary data with variable length, maximum bytes binary data with variable length, maximum bytes binary data with variable length, maximum bytes Four BLOB data types are provided for storing binary data, which have almost the same properties like the TEXT data types. (By the way, BLOB stands for Binary Large Object.) The only difference is that text data is usually compared and sorted in text mode (case-insensitive), while binary data is binary. But even this difference becomes obsolete if you declare TEXT fields with the additional attribute BINARY.

25 5.3 MySQL data types 163 NOTE It is generally controversial whether large binary objects should actually be stored in the database. The alternative is to save the data (e.g. images) in external files and only store links to these files in the database. The advantage of using BLOBs is the integration into the database (more security, easier backups), the disadvantage is the mostly significantly lower speed. It is particularly unfavorable that small and large data elements, i.e. strings, integers, etc., on the one hand, BLOBs and long texts on the other, have to be stored in a colorful mix in a table file. This also slows down access to all other data sets. Also note that BLOBs can generally only be read in full. (It is not possible to read the last 100 KB from a 800 KB BLOB; the entire BLOB must always be transferred.) Enumerations: ENUM, SET ENUM SET Selection of one of a maximum of character strings (1 or 2 bytes) combination of maximum 255 Strings (1-8 bytes) A ​​specialty of MySQL are the two enumerations ENUM and SET. With ENUM you can manage a list of up to character strings to which consecutive numbers are assigned. One of these character strings can then always be selected in the field. In queries, no distinction is made between upper and lower case when comparing strings. In addition to the predefined character strings, an empty character string can also be stored in the field (as well as NULL, if this was not excluded with NOT NULL). The field is handled in a similar way to an ordinary character string field. The following commands show how a table with an ENUM enumeration is created and used. One of five predefined colors can be saved in the color field of the testum table. CREATE TABLE testum (color ENUM ('red', 'green', 'blue', 'black', 'white')) INSERT testum VALUES ('red') SELECT * FROM testum WHERE color = 'red' SET follows a similar one Idea, but any combination is possible here. Internally, the character strings are assigned powers of two (1, 2, 4, 8 etc.) so that a bit-by-bit combination is possible. Accordingly, it is larger

26 164 5 database design also the memory requirement (one bit per character string). A maximum of 63 strings can be combined. (The memory requirement is then eight bytes.) In order to store a combination of several character strings in a field, these must be separated by commas (but without spaces!). The order of the character strings is irrelevant and is not taken into account. In query results, combinations are always specified in the order in which the set was defined. CREATE TABLE testset (fontattr SET ('bold', 'italic', 'underlined')) INSERT testset VALUES ('bold, italic') For queries with the operator =, an exact comparison of the entire combination is carried out. As a result, only those data sets are delivered for which the combination is exactly correct. If only the data set inserted above was saved with 'bold, italic' in testset, then the following query does not return a result: SELECT * FROM testset WHERE fontattr = 'italic' To also find data records in which an attribute is set (independent combination with other attributes), the MySQL function FIND_IN_SET can be used. This function returns the position of the searched character string within the set (in this example 1 if 'bold' is found, 2 for 'italic' etc.).SELECT * FROM testset WHERE FIND_IN_SET ('italic', fontattr)> 0 TIP ENUM and SET values ​​are represented internally as numbers, not as character strings. If you want to determine the internally stored numerical value in queries, simply use SELECT x + 0 FROM table, where x is the column name of the ENUM or SET column. It is also permissible to save numerical values ​​with INSERT or UPDATE commands. NOTE The contents of ENUM and SET fields are not sorted alphabetically, but in the order in which the character strings available for selection were defined! (The reason is that MySQL works internally with the numerical values ​​assigned to the strings.) If you want to sort the column alphabetically, you must explicitly convert the column to a string, e.g. E.g. so: SELECT CONCAT (x) AS xstr ... ORDER BY xstr

27 5.4 Indices 165 TIP If you want to determine the list of all valid character strings for an ENUM or SET field (for example in a client program), you must use DE-SCRIBE tablename columnname. The SQL command supplies a table in which the columnname field is described. The Type column of this table contains the ENUM or SET definition. In section you will find an example of the evaluation of this information (in the PHP programming language). 5.4 Indexes Introduction If you want to find a particular record in a table or create an ordered table from a series of records, MySQL must load all the records in the table. The following lines show some corresponding SELECT commands (details on this follow in the next chapter): SELECT column1, column2 ... FROM table WHERE column3 = 12345 SELECT column1, column2 ... FROM table ORDER BY column3 SELECT column1, column2 ... FROM table WHERE column3 LIKE 'Smith%' SELECT column1, column2 ... FROM table WHERE column3> 2000 With large tables, such everyday queries naturally suffer from performance. Fortunately, there is a simple workaround: you need an index for the affected column (in the above examples for column3)! An index is an additional file with sorted cross-references to the records in a table. (So ​​a database index works very similarly to the index of this book. The index saves you the trouble of reading the entire book from cover to cover just to find out where a particular topic is described.) In principle, an index can be used for each field in the table up to a maximum of 16 indexes per table can be set up. (MySQL also allows indices for several fields at the same time. This is useful if a combination of fields is often searched for or sorted, for example country = 'austria' AND city = 'graz').

28 166 5 Database Design CAUTION Indexes are not a cure-all! Although they speed up access to data, they also slow down any change in the database. Every time a data record is changed, the index must also be changed. This disadvantage can be alleviated somewhat with the DELAY_KEY_WRITE option for various SQL commands. The option means that the index is not updated with every new or changed data record, but only every now and then. DELAY_KEY_WRITE is useful, for example, when as many new data records as possible should be inserted into the table as quickly as possible. Another obvious disadvantage of indexes is that they naturally require additional disk space. (Incidentally, B-trees are used internally to manage the index entries.) So only use indexes for those columns that you often search for or sort for. Indexes are largely useless if the column contains a large number of identical entries. (In such cases, you should rather ask yourself whether the normalization of the database could still be optimized.) Types of indexes Ordinary index The only task of an ordinary index (defined with the INDEX keyword) is to speed up access to the data . TIP Only index columns that you need in conditions (WHERE column = ...) or for sorting (ORDER BY column). If possible, index columns with compact data (e.g. integers). Do not index columns that contain many of the same values. (For example, it does not make sense to index a column with 0/1 or Y / N values!) Restrictions MySQL cannot use indexes if you use inequality operators (WHERE column! = ...). Likewise, indices cannot be used for comparisons in which the column content is processed with a function (WHERE DAY (column) = ...). In JOIN operations (that is, when joining data from several tables), indexes only help if the primary and foreign keys have the same data type.

29 5.4 Indices 167 If the comparison operators LIKE or REGEXP are used, an index only helps if there is no wildcard at the beginning of the search pattern. An index helps with LIKE 'abc%', but not with LIKE '% abc'. After all, indices are only used in ORDER-BY operations if the data records do not have to be selected beforehand according to other criteria. (Especially with queries in which the data records are put together from several tables, an index unfortunately only seldom helps to accelerate ORDER BY.) Unique index With a normal index it is allowed that several data records of the table in the indexed field have the same value exhibit. (For example, in a personnel table there can be twice the same name even though they are different people.) If the context makes it clear that a column contains unique values, you should define the index with the keyword UNIQUE. This has two consequences: Firstly, it is now easier for MySQL to manage the index, i.e. the index is even more efficient; Secondly, MySQL now ensures that you do not insert a new record if there is already another record with the same value in the indexed field. (Sometimes a UNIQUE index is only defined for this reason, not to optimize access, but to avoid duplicates.) Primary index The primary index mentioned again and again in section 5.2 is a very common UNIQUE index. The only special feature is that the index has the name PRIMARY. (You can also tell that a primary index is a very common index by the fact that MySQL simply uses the first UNIQUE index of the table to replace a missing primary index.) Compound indexes An index can also contain several columns, e.g. INDEX (columnA, columnb ). A special feature of such indexes is that MySQL can use such an index selectively. So if only one index for columna is required for a query, the composite index for INDEX (columnA, columnb) can be used. However, this only applies to sub-indices at the beginning of the index sequence. INDEX (A, B, C) can also be used as an index for A or for (A, B), but not as an index for B or C or (B, C) etc.

30 168 5 Database Design Limiting the Index Length When defining an index for CHAR and VARCHAR columns, you can limit the index to a certain number of characters (which is less than the maximum number of characters allowed in this field). This means that the resulting index file is smaller and its evaluation is faster. In most cases of application, i.e. for character strings with names, 10 to 15 characters are sufficient to reduce the search volume to just a few data records. You must apply this restriction to BLOB and TEXT columns, although MySQL allows a maximum index length of 255 characters. Full text index A normal index for text fields only helps when searching for character strings that are at the beginning of the field (i.e. whose first letters are known). If, on the other hand, you save texts in text fields that consist of several, possibly very many words, a conventional index has no effect. The search must be formulated in the LIKE '% word%' type, which is extremely time-consuming for MySQL and leads to unbearably long response times for large amounts of data. In such cases, a full-text index can be used. (This is a relatively new extension of MySQL that is only available from version onwards.) With this type of index, MySQL creates a list of all words that occur in the text. SELECT queries can then be used to search for records that contain one or more words. A separate query syntax applies: SELECT * FROM table WHERE MATCH (column1, column2) AGAINST ('word1', 'word2', 'word3') This will find all data records for which the words word1, word2 and word3 occur. REFERENCE The creation of a full-text index and the associated search and sorting options are among the advanced MySQL techniques and are therefore only presented in more detail in Chapter 12.2 Query and index optimization Basically, realistic performance assessments can only be carried out if the database has sufficient test data is filled. A test database with a few hundred data records is usually completely in RAM after the first three queries; all queries are answered very quickly with or without an index. It becomes interesting when tables contain well over 1000 data

31 5.4 Indexes contain 169 records or if the total size of the database exceeds that of the RAM on the MySQL server. When deciding which columns should be provided with indexes, the EXPLAIN SELECT command can be used under certain circumstances. To do this, simply put the keyword EXPLAIN in front of a normal SELECT command. Instead of simply executing the SELECT query, MySQL displays information in a table about how the query would be executed and which indexes (if any) would be used. Here are a few help for the interpretation of the table generated by EXPLAIN: In the first column the names of the tables are given in the order in which they are read from the database. The column type indicates how the table is linked with the other tables (JOIN). This works most efficiently (fastest) with the type system, the types const, eq_ref, ref, range, index and ALL are increasingly complex. (ALL means that all data records of this table have to be read in for each data record of the parent table. This can usually be prevented with an index. Further information on all JOIN types can be found in the MySQL manual.) The column possible_keys indicates on which indexes MySQL can use when searching for records. The key column indicates which index MySQL actually chose. key_len specifies the width of the index in bytes. For an index using an INTEGER column, the number of bytes is, for example, 4. key_len can also provide information about how many parts of a multi-part index are used. Basically, the smaller key_len is, the better (i.e. faster). The ref column specifies which column of another table is used to establish the link. rows contains an estimate of how many records MySQL can expect to read to execute the entire query. The product of all the numbers in the rows column allows conclusions to be drawn about how many combinations result from the query. Finally, the extra column provides additional information about the JOIN operation, for example using temporary if MySQL has to create a temporary table to execute the query. REFERENCE While the information EXPLAIN provides is often valuable, interpreting it requires some MySQL and database experience. Further interpretation aids can be found in the MySQL manual.

32 170 5 Database design example 1 The query determines an unordered list of all books with all their authors. All ID columns have primary indexes. USE mylibrary EXPLAIN SELECT * FROM titles, rel_title_author, authors WHERE rel_title_author.authid = authors.authid AND rel_title_author.titleid = titles.titleid table type possible_ key key_len ref rows titles ALL PRIMARY 23 rel_title_author ref PRIMARY_ PRIMARY 4 authors et PRIMARY 4 rel_title_author.authid 1 This means that all records are first read from the titles table without using an index. There is an index on titles.titleid, but since the query is going to collect all records anyway, the index is not used. Then, with the help of the primary indexes of rel_title_author and authors, the links to the other two tables are established. The tables are therefore optimally indexed; indexes are available for all parts of the query. Example 2 The query determines a list of all books (with their authors) that have been published by a particular publisher. The list is sorted by book title. Again, all ID columns are provided with primary indexes. In addition, title and publid are indexed in the titles table. EXPLAIN SELECT title, authname FROM titles, rel_title_author, authors WHERE titles.publid = 1 AND titles.titleid = rel_title_author.titleid AND authors.authid = rel_title_author.authid ORDER BY title table type key key_len ref rows Extra titles ref publid 5 const 9 where used; Using filesort rel_title_ author ref PRIMARY 4 titles.titleid 1 Using index authors eq_ref PRIMARY 4 rel_title_author. authid 1 The possible_keys column has been removed here for reasons of space. For interpretation: The tables are optimally indexed, i.e. there are indices for all parts of the query

33 5.5 MySQL table types 171 available. It is interesting that the title list (ORDER BY title) is obviously sorted externally, although an index is also available for the title column. The reason is probably that the title records are first selected according to the condition publid = 1 and the title index can then no longer be used. Example 3 This example uses the same SELECT query as Example 2, but assumes that titles.publid is not indexed. The result is that now all records of the titles table have to be read and no index can be used. table type key key_len ref rows Extra titles ALL 23 where used; Using filesort rel_title_ author ref PRIMARY 4 titles.titleid 1 Using index authors eq_ref PRIMARY 4 rel_title_author. authid MySQL table types Up until now it has been tacitly assumed that all tables are generally created as MyI-SAM tables. This has been the default behavior since MySQL 3.23 and applies automatically if you do not explicitly request a different table type when creating the table. This section gives a brief overview of which table types MySQL knows, what properties they have and when they are recommended. MyISAM Static: This table variant is used when all columns of the table have a fixed, predefined size. Access to such tables is particularly efficient. This applies even if the table is changed frequently (i.e. if there are many INSERT, UPDATE and DELETE commands). In addition, data security is very high because it is comparatively easy to extract data sets even with damaged files or other serious problems. MyISAM Dynamic: As soon as there is only a single VARCHAR, xxxtext or xxxblob field in the declaration of a table, MySQL automatically selects this table type. The main advantage compared to the static MyI-SAM variant is that the space requirement is usually significantly lower:

34 172 5 Database design chains and binary objects only take up storage space according to their actual size (plus a few management bytes). However, this means that data sets are of different sizes. If data records are changed later, their location within the database file may also have to be changed. A hole appears in the database file at the old location. It is also possible that the fields of a data record are not always stored in a contiguous block within the database file, but in different locations. All of this means that access to tables that are subject to major changes becomes progressively slower unless an OPTIMIZE TABLE or an optimization program is executed every now and then (myisamchk, see Chapter 11). MyISAM Compressed: Both static and dynamic MyISAM tables can be compressed with the additional program myisampack. This usually reduces the space requirement of the table to less than half (depending on the content of the table). Although each data record has to be decompressed when it is read out, access to the table may still be faster, for example with a combination of a slow drive and a fast processor. The decisive disadvantage of compressed MyISAM tables is that they can no longer be changed (i.e., they are now read-only tables!). ISAM: The ISAM table type is the predecessor of MyISAM. ISAM tables have a number of disadvantages compared to MyISAM tables. Probably the most important disadvantage is that the resulting files are not independent of the operating system. In general, you cannot continue to use an ISAM table that was created by MySQL under operating system x under operating system y. (Instead, you must use mysqldump and mysql for the transfer.) In general, there is no longer any reason to use ISAM tables (unless you are using a very old version of MySQL). BDB (Berkeley_DB): The essential property of BDB tables is that they allow transactions.The SQL commands BEGIN, COMMIT and ROLLBACK are only available if BDB tables are used. However, this advantage entails greater administrative effort, i.e. access to and changes to BDB tables are somewhat slower than with MyISAM tables. Another major disadvantage is that BDB tables have only been around for a relatively short time (version upwards); therefore, this type of table is not equally stable and mature. In addition, some MySQL functions such as the full-text index are currently only available for MyISAM tables. Otherwise, BDB tables differ in a few small details from MyI- SAM tables: BDB tables must have a primary key (with MyISAM-

35 5.5 MySQL table types 173 tables, this is usually also recommended, but not necessary), SHOW TABLE STATUS provides less information than with MyISAM tables etc. Practical information on the use of BDB tables and the use of the transaction commands can be found in the MERGE section : This type of table is also fairly new (available from Version). Essentially, it is the virtual union of several existing MyISAM tables, all of which have exactly the same column definition. A MERGE table made up of multiple tables can have some advantages over a single, very large MyISAM table, such as: For example, a higher reading speed (if the tables are distributed over several hard disks) or a circumvention of the maximum file size for some operating systems (e.g. 2 GByte for Linux 2.2 for 32-bit processors). One of the disadvantages is that it is not possible to insert records into MERGE tables (i.e. INSERT does not work). Instead, INSERT must be used on one of the sub-tables. HEAP: HEAP tables are only created in RAM (not on the hard drive) and use a so-called hash index, which enables particularly fast access to individual data records. Compared to normal tables, HEAP tables have a number of functional restrictions, of which only the most important are mentioned here: No xxxtext or xxxblob data types may be used, data records can only be searched for with = or <=> (not with <,>, <=,> =), AUTO_INCREMENT is not supported, and indexes can only be created on NOT-NULL columns. HEAP tables should always be used when relatively small amounts of data are managed at maximum speed. Since HEAP tables are only stored in RAM, they are lost as soon as My-SQL is exited. The maximum size of HEAP tables is determined by the parameter max_heap_table_size and can be set when MySQL is started. Temporary tables: Most of the table types listed above have the option of creating the tables temporarily. Such tables are automatically deleted as soon as the connection to MySQL is terminated. In addition, temporary tables are invisible to other MySQL connections (so that it is possible for two users to use temporary tables with the same name without getting in each other's way). Temporary tables are not a separate table type, but only a variant of the types described above. Temporary tables are sometimes created automatically by MySQL to execute complex SELECT queries.

36 174 5 Database design Temporary tables are not saved in the same directory as the other MySQL tables, but in a separate temporary directory (under Windows mostly C: \ Windows \ Temp, under Unix / Linux mostly / tmp or / var / tmp or / usr / tmp). The directory can be set when starting MySQL. NOTE Temporary tables of the HEAP type are often created. But this is by no means a mandatory combination! Note, however, that non-temporary HEAP tables are visible to all MySQL users until they are deleted. Table files MySQL databases are stored as directories and tables are stored as files. (The location for the database directories can be specified when MySQL is started. On Unix / Linux, / var / lib / mysql is often used, on Windows usually \ Program Files \ mysql \ data.) The following table shows the most important file identifiers that The following are used for table files: tablename.frm tablename.myd tablename.myi tablename.db Table schema (data types of columns, indices etc.) MyISAM table data MyISAM indices (all indices of the table) BDB table (data and all indices) 5.6 Databases , Creating Tables, and Indexes In the course of this chapter you learned a lot about database design. Now the question of the implementation is still open, i.e. how the database or its tables are actually created. As always, there are several ways to do this: Real SQL professionals will not shy away from taking the classic route and typing a series of CREATE-TABLE commands into mysql. Unfortunately, the complicated syntax of this command does not suit the rather creative character of the design process. Nevertheless, SQL commands are often the only feasible way to use table features that are rarely used. For beginners in the database world (and simply for everyone who does not like to make life more difficult than necessary), graphical user interfaces offer more convenience in database design. Some such programs (e.g. phpmyadmin and ksql) were introduced in the previous chapter.