In this week’s reading, ‘Evolution of Data Storage Models’, various storage models were discussed from flat files to object-oriented databases.? While the document covers much of th
In this week's reading, "Evolution of Data Storage Models", various storage models were discussed from flat files to object-oriented databases. While the document covers much of the topic, it is by no means complete. The needs of organizations, and their underlying technologies, continue to change. Because of these changes, database models will continue to evolve. Information Systems professionals who wish to remain relevant will need to stay abreast of these new database models. It is, therefore, important for Information Systems professionals to develop research skills so they can stay current on technologies as they change. This first assignment is an opportunity to build, or reinforce, those research skills.
Assume that your boss has just caught the last minute of an NPR report on the newest database trend: NoSQL databases. Your boss is intrigued and wants you to prepare a one page executive summary on the topic for him. Specifically, the summary needs to explain what a NoSQL database is (including its major features) and how it differs from a relational database.
The summary should include three to five references with proper citations – use APA format. For the body of the document, use single spacing, one inch margins, and a 12-point font.
Points will be awarded as follows: Up to 20 points for complete/correct information; up to 10 points for proper references and citations; up to 10 points for quality writing and correct grammar.
Evolution of Data Storage Models
1
Historic Data Storage Models This week’s presentation introduced databases and defined them as organized collections of related data. The presentation stressed the importance of the word “organized” in the definition since the data in a database must be structured to allow for ease of access. The way data is organized has evolved over the history of computing. This evolution has produced a variety of data structures, or models – each with its own advantages and disadvantages. This document walks you through some of the major models, describing the strengths and weaknesses of each. Specifically, we will look at the following:
Flat files
Hierarchical databases
Network databases
Relational databases
Object-oriented databases Flat Files Flat files have been around since the earliest days of computing. As such, they use one of the simplest data structures, sequential storage. This means that the records in a flat file are stored one after the other, much like the records in a text file or on a tape.
Sequential storage is appropriate for storing archived data or files where every record will be processed (e.g., a payroll file where every employee record is read to generate a paycheck). It is not, however, a good structure for searching. Since the records in a sequential file are read one after the other, searching a flat file for a specific record can be a time-consuming process. Assume you are searching for a record in a file containing 1,000 records. You could get lucky and the record you are looking for could the first one in the file – meaning you only did one read. You could, just as easily, be unlucky and the record you are looking for could be the last one in the file – meaning you had to do 1,000 reads. In most cases, however, your record won’t be the first, or the last, in the file. It will most likely be somewhere in the middle. This means that, on average, you will have to do n/2 reads to find a specific record, where n equals the number of records in the file. Using the 1,000 file example, searching for a record would take 1,000/2 or 500 reads on average. Although 500 reads may seem like a small number, it is clearly not ideal. The problem only gets worse as the number of records increases.
Evolution of Data Storage Models
2
Hierarchical Databases In order to improve search performance and make data more accessible, the hierarchical database model was developed in the 1950’s. Instead of using sequential storage, hierarchical databases use a tree structure to store data. The following example shows the structure of a hierarchical database for the BIS department. Each box in the structure is called a node. You can view the structure as an inverted tree. The top node (BIS) is called the root, while the bottom nodes (Jones, Zelinski, Getz, etc.) are the leaves. The lines connecting the nodes are called branches. The branches show which nodes are related to each other. For example, the BIS node is connected to BIS420 and BIS422 because those are courses in the BIS department. Likewise, the Jones, Zelinski, and Getz nodes are connected to BIS420 because those are students in that course. When two nodes are connected, the higher node is called the parent while the lower node is called the child. In this example, the Jones, Zelinski, and Getz nodes are all children of BIS420.
Using a tree structure improves search performance because you don’t have to read half the file to find the record you want. Instead, you simply follow the correct branches of the tree to quickly locate the record you are seeking. As an example if you wanted to find the record for Smith in BIS422, you would follow the branch from BIS to BIS422 and then the branch from BIS422 to Smith. Although hierarchical databases improved search performance they also suffered from too much duplicate data. This problem was caused by the fact that, in a hierarchical database, a child node can only have one parent. In the example above, the student Getz is taking both BIS420 and BIS422. Since a child node can only have one parent, Getz’s node cannot be connected to two courses. His data must, therefore, be duplicated – with one node connected to BIS420 and one node connected to BIS422. Given that most students take more than one class at a time, it should be obvious that the “one parent” rule would quickly lead to a database with lots of duplicate data.
Evolution of Data Storage Models
3
Having duplicate data is a problem for two reasons. First, it means wasting storage space. In the early days of computing, storage space was expensive so storing the same data repeatedly was not cost-effective. Today, storage is cheap but that does not mean it is free. Wasting storage is still not an efficient way to do business. Storing duplicate data also causes data integrity problem. As an example, let’s consider the two Getz records in the BIS database. If the student’s major was changed in one record, but not in the other record, then Getz would have two majors. To be clear, this would not be a double major but, instead, two mutually exclusive majors. Which one is correct? You could pick the record that wasn’t modified, but the modified record might be correct. You could pick the record that was modified, but the modified record might be incorrect. With either choice, there is a chance that the decision will be wrong. The integrity of the data would be in doubt. Given these reasons, it is clear that duplicate data can be a serious issue. Network Databases In an effort to address the duplicate data problem, network databases were developed in the 1970’s. A network database looks a lot like a hierarchical database with one significant difference. In a network database, a child node can have more than one parent.
Structuring the BIS data as a network database, we see that Getz’s data is recorded only once. Since a child node can have more than one parent, the Getz node it simply connected to both BIS420 and BIS422. This eliminates the duplicate data problem seen in hierarchical databases while still providing a structure that is easy to search. Even with its improvements, the network database model still suffered from a fundamental problem, a lack of flexibility. In both hierarchical and network databases, the connections between nodes were established when the databases were created. The relationships were then set – making them difficult to change. If new nodes needed to be added to the middle of
Evolution of Data Storage Models
4
the tree, or if relationships needed to be changed, the database had to be rebuilt. Depending on the size and complexity of the database, the process of rebuilding it could take a considerable amount of time. During that time, the database, and its data, would be unavailable to users. In most organizations, losing a database for hours, or potentially days, is unacceptable. Unfortunately, the inflexibility of hierarchical and network databases made this situation a real possibility. Relational Databases Given the need for a more flexible storage structure, an engineer at IBM named E.F. Codd proposed the relational database model in 1970. Codd based his new model on a branch of mathematics called relational algebra. The first relational database management systems (RDMS) came out in that 1970’s and became widespread in the 1980’s.
E.F. Codd
“In Codd We Trust”
In a relational database the data is structured in relations. A relation is a named two- dimensional table of data. Said another way, a relation is a table made up of rows and columns (a lot like a spreadsheet). The following example shows a table of employee data:
Relational databases are much more flexible than hierarchical or network databases. To connect tables of a relational database, you simply need to have a common column in both tables. In the following example, we have three relations: Student, ClassGrade, and Class. Student and ClassGrade are related because they both have a StudentNbr column. Likewise, ClassGrade and Class are related because they both have a ClassNbr column.
Evolution of Data Storage Models
5
Using common fields to connect tables makes it easy to create relationships and modify them as things change. This flexibility has made relational databases the dominate storage model in business. As such, they will be the central focus of this course. Object-Oriented Databases In the 1990’s, the dominance of relational databases was challenged by a new model based on object-oriented programming. These object-oriented databases structured the data as object with properties and methods. The push to create object-oriented databases was largely caused by the rising popularity of object-oriented programming (OOP). While OOP has become the standard method for programming, object-oriented databases never really achieved widespread acceptance. In part, the lack of acceptance was caused by the fact that relational databases had become ubiquitous in the 1980’s and the costs of converting them to a new model would have been prohibitive. Relational database also worked well and were, for many, intuitive to use. For these reasons, object-oriented databases were unable to supplant the relational model in business.
Collepals.com Plagiarism Free Papers
Are you looking for custom essay writing service or even dissertation writing services? Just request for our write my paper service, and we'll match you with the best essay writer in your subject! With an exceptional team of professional academic experts in a wide range of subjects, we can guarantee you an unrivaled quality of custom-written papers.
Get ZERO PLAGIARISM, HUMAN WRITTEN ESSAYS
Why Hire Collepals.com writers to do your paper?
Quality- We are experienced and have access to ample research materials.
We write plagiarism Free Content
Confidential- We never share or sell your personal information to third parties.
Support-Chat with us today! We are always waiting to answer all your questions.