In 1597, Sir Francis Bacon published the first appearance of this well-known, widely-used phrase: “Knowledge is power.” Certainly, he wasn’t talking about information systems as we know them in the tech industry, but the phrase still holds a tremendous amount of significance as it acknowledges the valuable potential and capacity that comes with insightful information that becomes knowledge. But where does information come from? The technology ecosystem is data-driven and finding value in data is becoming critical for successful businesses, which is the topic at hand for this article: data and information. We cover data vs information to better understand their interdependence, their points of difference, and how one cannot exist without the other. Let’s begin by defining each concept.
What is data?
Regardless of industry, data is driving the future and a massive number of technologies across multiple industries heavily depend on it to thrive. Based on the definition of data from TechDifferences, data is “raw, unanalyzed, unorganized, unrelated, uninterrupted material which is used to derive information after analyzation.” Essentially, data is plain facts, observations, statistics, characters, symbols, images, numbers, and more that are collected and can be used for analysis. Data left alone is not very informative and in that sense, it is relatively meaningless, but it gains purpose and direction after it is interpreted to derive significance. Whether qualitative or quantitative, data is a set of variables that help construct outcomes. Another key characteristic of data is that it’s freestanding and does not depend on any other concept to exist, unlike information which only exists because of data and is entirely dependent on it. Data and information are measured in bits and bytes. It can be represented in structured/unstructured tables, graphs, trees, etcetera, and it doesn’t have significance until it is analyzed to meet a specific user’s needs. Now, let’s move on to information.
What is information?
If data is the atom, information is the matter. Information is the set of data that has already been processed, analyzed, and structured in a meaningful way to become useful. Once data is processed and gains relevance, it becomes information that is fully reliable, certain, and useful. According to this Forbes article, information is “prepared data that has been processed, aggregated and organized into a more human-friendly format that provides more context. Information is often delivered in the form of data visualizations, reports, and dashboards.” Information addresses the requirements of a user, giving it significance and usefulness as it is the product of data that has been interpreted to deliver a logical meaning. As we’ve stated, information cannot exist without its building block: data. Once data is transformed into information, it doesn’t contain any useless details as its whole purpose is to possess specific context, relevance, and purpose. What is data versus information? Data is just some measurement, let’s say we have a temperature 15F. Is it cold? Yes. But is it cold enough for keeping food in the refrigerator? No. This is how we differentiate data and information. Ultimately, the purpose of processing data and turning it into information is to help organizations make better, more informed decisions that lead to successful outcomes. Organizations use Information Systems (IS) to collect and process data, a combination of technologies, procedures, and tools that assemble and distribute information needed to make decisions. From the other point of view, there are similarities between data and information. Both are the meterage, and each has some measurement errors. You can use various methods to estimate measurement errors. The theory of errors and processing of experimental results will help assess data and information for specific processes in this case. Also, in practice, methods of validation and verification of input data are used to improve the quality of information.
What is the difference between data and information?
The terms are sometimes mistakenly used interchangeably when there is a clear distinction between the two. The significant and fundamental difference between data and information is the meaning and value attributed to each. Data is meaningless, but once processed and interpreted, it becomes information filled with meaning. Let's explain what differentiates data from information. These two concepts cannot exist without each other. There can be no information that is not based on data. And accordingly, collecting data without turning it into information is also pointless. To put it into context, think of data as any series of random numbers and words without meaning. Let’s take a look at some data vs. information examples. If we consider an example of data, here it is: 4a 61 6e 65 20 44 6f 65 2c 0a 34 20 53 74 72 65 65 74 2c 0a 44 61 6c 6c 61 73 2c 20 54 58 20 39 38 31 37 34 0a. Once the aforementioned data is processed, interpreted, formatted, and organized, you can see that it is the contact information of Jane Doe:
- Jane Doe,
- 4 Street,
- Dallas, TX 98174
Another clear example of the distinction between data and information are temperature readings from across the globe. A long list of temperature readings mean nothing of true significance until organized and analyzed to unearth information such as trends and patterns in global temperatures. Once data is analyzed, users can identify if the temperature has been on the rise over the last year or if there’s a regional trend for specific natural disasters. Those types of discoveries are information that is extracted by analyzing data. Here’s a comparison table to help pinpoint the key differentiators between data and information.
Criteria | Data | Information |
Meaning | Raw facts, that are the building blocks for information. | Combined data filled with relevance and significance. |
Form | Unorganized. | Organized. |
Basis | Records and observations. | Analysis. |
Dependency | Does not depend on information. | Depends on data. |
Measurements | Bits and bytes. | Meaningful parameters such as time, quantity, dates, etc. |
Significance and usefulness | Data alone has no significance. | Information is always significant, useful, and relevant. |
Specific | No. | Yes. |
So you now have all the necessary knowledge to compare data and information. As usual, data is unorganized and may consist of measurement errors, zero values, or outliers of highs and lows that should be filtered out. In turn, the information is processed as facts and can be further used for decision-making and processing in BI, data science, etc.
One bit and one byte
As the base of measure for digital information, bits and bytes play a fundamental role in the subjects of data and information. Computers, with their millions of circuits and switches, use the binary system to represent on and off or true and false, using bits and bytes. A bit, which is short for binary digit, is the most basic and smallest unit of data measurement in computer information and it contains only two values: 0 and 1. Bits are usually designed to store data and execute instructions in strings of 8 bits, which is called a byte. The term byte was first coined by Werner Buchholz in 1956 and it represents this unit of data measurement, which is eight binary digits long. All computers use bytes to represent all kinds of information including letters, numbers, images, audio, videos, and more. Given that all information in computers is larger than a bit, the byte is considered the universal and smallest measurement size listed in operating systems, networks, etc. To put this in perspective and according to statistics from TechJury, by 2020, every person will generate 1.7 megabytes of data in just a second. And what is a megabyte? It is 1,048,576 bytes. Here are some helpful references for units of data measurement: Bits.
- 8 bits constitute 1 byte.
Bytes.
- 1,024 bytes constitute 1 Kilobyte. (Please note that in 1998, the International Electrotechnical Commission (IEC) created the prefixes kibi, mebi, gibi, and so on to denote powers of 1024. The kibibyte came to represent 1024 bytes. These prefixes are now part of the International System of Quantities. Furthermore, the IEC specified that the kilobyte should be used only to refer to 1000 bytes.)
- 1,048,576 bytes constitute 1 Megabyte.
- 1,073,741,824 bytes constitute 1 Gigabyte.
- 1,099,511,627,776 bytes constitute 1 Terabyte.
- 1,125,899,906,842,624 bytes constitute 1 Petabyte.
- 1,152,921,504,606,846,976 bytes constitute 1 Exabyte.
- 1,180,591,620,717,411,303,424 bytes constitute 1 Zettabyte.
- 1,208,925,819,614,629,174,706,176 bytes constitute 1 Yottabyte.
- As of 2018, there’s no recognition for anything bigger than the yottabyte.
These staggering numbers translate into day-to-day examples of how much is transmitted across networks or stored in digital spaces. For example, it takes about 10.5 megabytes to store one minute of high-quality stereo digital sound and at this rate, one hour of music takes up to 600 megabytes of storage space. A one-minute-long video that is high-definition takes approximately 100 megabytes of storage space. One of the tools to indicate the amount of space a video format can take up on a disk is the Video Space Calculator where users can input different parameters to gain an idea of just how many bytes a video will consume. With these figures in mind and according to this article from Visual Capitalist, the digital universe is expected to reach over 44 zettabytes by 2020. If that number becomes a reality, it will mean there will be 40 times more bytes than there are stars in the observable universe. By 2025, it’s estimated that 463 exabytes of data will be created worldwide, on a daily basis. As you can see, bits and bytes are incredibly significant in the modern technology landscape as they help organize data in a standardized way that in turn helps boost data processing efficiency of network equipment, disks, and memory. For example, it is fairly common to hear the terms 32-bit and 64-bit as they define the fixed-size of data that a processor can transfer to and from memory.
What is raw data and how is it transformed into information?
Now that we understand better the intricacies of data and information, let’s examine raw data and how it is transformed into useful information that ultimately leads to insights. Based on the definition provided by TechTerms, raw data is “unprocessed computer data. This information may be stored in a file, or may just be a collection of numbers and characters stored somewhere in the computer’s hard disk.” Typically, data that is entered into a database is referred to as raw data and it can be user-generated or entered by the computer itself. Raw data comes from numerous sources such as relational databases, machine-generated data, data mining tools that extract data from the web, real-time data, data from the Internet of Things (IoT) devices, human-generated data, and more. Given that it is raw, this type of data, which is also oftentimes referred to as primary data, is jumbled and free from being processed, cleaned, analyzed, or tested for errors in any way. As stated, raw data is unprocessed and unorganized source data that once it’s processed and categorized becomes output data. Because raw data is messy, it’s important to use deconstruction analysis techniques to process it accordingly since structured data allows easy retrieval and raw data requires cleaning, preparation, and formatting before data analysis can begin and lead to the extraction of information. Filtering, reviewing, and interpreting raw data leads to the extraction of useful information that is relevant, useful, and valuable. There is a procedure in computing known as extract, transform, load (ETL) that combines these aforementioned functions in a single tool to harness data out of a database and place it into another database. Typically, it is used to build data warehouses by extracting data from a source system, transforming it into an easy-to-analyze format, and loading it into another database, data warehouse or system. For many years, ETL has been the de facto procedure to collect and process data as it gives organizations the opportunity to capture and analyze data quickly. Once data is normalized through the use of a procedure such as ETL, there needs to be a robust information system in place to understand and give meaning to the extracted data. Can the terms data and information be used interchangeably? The correct answer is no. Data refers to raw, unanalyzed facts collected due to specific measurements or counts, while information results from analysis and interpretation to provide meaningful knowledge. Therefore, it is impossible to use the terms data and information interchangeably.
Information systems best practices to gain value from data
As proven, once data is normalized through the use of procedures such as ETL, it is ready to be leveraged by an information system to give it meaning and utility. By employing a comprehensive information system, users can leverage the available tools, technologies, and techniques to help transform data into information that will eventually become insights/knowledge. As Techopedia defines it, an information system is the “collection of multiple pieces of equipment involved in the dissemination of information. Hardware, software, computer system connections and information, information system users, and the system’s housing are all part of an IS.” These components come together to store, retrieve, transform, and disseminate information.
- Hardware: The computer itself along with its peripherals such as servers, routers, monitors, printers, storage devices, keyboard, mouse, etc.
- Software: The software system is what instructs the hardware what to do. The software collects, organizes, and manipulates data to carry out instructions.
- Data/databases: The information part of any information system. Data is critical.
- Network/communication: Devices that communicate with each other to share information and resources.
- Procedures: Strategies, descriptions, policies, instructions, methods, and rules to use information systems.
- Users/people: This component is what glues together all the other components as they combine hardware, software, data, network, and procedures to generate valuable information.
Information systems require a comprehensive strategy to deploy best practices that drive actionable insights. Some of these best practices include data integration, data virtualization, event stream processing, metadata management, data quality management, and data governance, to name a few.
- Data integration: Combining data from several sources into a centralized view.
- Data virtualization: Retrieving and manipulating data to deliver a simple, unified, and integrated view of data in real time.
- Event stream processing: Analyzing time-based data as it’s created and before it’s stored, even as it streams from one device to another.
- Metadata management: Administration of data that describes other data.
- Data quality management: Practice of identifying data flaws and errors and simplifying the analysis and remediation of data flaws.
- Data governance: Management of availability, usability, integrity, and security of data.
Conclusion: information vs data
In the last couple of years, information science and the technology associated with it have made significant leaps forward. From local servers that transitioned to the cloud, smarter databases, key-value data stores, and more, data is being processed and analyzed at break-neck speed. Along with speed, another key factor that plays a big role in the success of processing data and information is the relatively low cost associated with the use of hard disk drives, solid-state drives, and the cloud. For instance, organizations store information in the cloud in raw format and then use procedures such as ETL along with information systems to generate insightful information. Data and information solve real-life problems with the many applications they impact by injecting knowledge into the decision-making process. From space programs, medical applications, education, retail, financial services, and software development, just to name a few, there is no limit to the number of industries that benefit by the second from the value extracted from data and information. To sum up, these two interrelated concepts are the cornerstone of valuable insights that drive intelligent decisions and successful outcomes for businesses and organizations alike.