4. Exploration of Structured vs. Unstructured Data

Data is everywhere, and it can be stored in many ways. Two general categories of data are: 

Structured data: Organized in a certain format, such as rows and columns.

Unstructured data: Not organized in any easy-to-identify way.

For example, you create structured data when you rate your favorite restaurant online. But when you use Google Earth to check out a satellite image of a restaurant location, you’re using unstructured data. Here’s a refresher on the characteristics of structured and unstructured data:

Structured data

As we described earlier, structured data is organized in a certain format. This makes it easier to store and query for business needs. If the data is exported, the structure goes along with the data. For example, if you need to analyze data about the unstructured data in emails, photos, and social media sites, it’ll most likely be structured for analysis before you even get to it. Because of that, I want to explore structured data a bit more. As a quick refresher, structured data is data organized in a format like rows and columns. But there’s definitely more to it than that. Structured data works nicely within a data model, which organizes data elements and how they relate to one another. 

What are data elements? 

They’re pieces of information, such as people’s names, account numbers, and addresses. 

Data models help to keep data consistent and provide a map of how data is organized. 

This makes it easier for analysts and other stakeholders to make sense of their data and use it for business purposes. Structured data is also useful for databases. In addition to working well within data models, it is also useful for storing and analyzing data. 

This makes it easy for analysts to enter, query, and analyze the data whenever necessary. 

This also makes data visualization easy because structured data can be applied directly to charts, graphs, heat maps, dashboards, and most other visual representations of data. Spreadsheets and databases that store data sets are widely used structured data sources. After exploring other data structures, you’ll check out more data types using a spreadsheet. 

Unstructured data

Unstructured data can’t be organized in any easily identifiable manner. And there is much more unstructured than structured data in the world. Lecture and audio files, text files, social media content, satellite imagery, presentations, PDF files, open-ended survey responses, and websites qualify as unstructured data types. Most of the data being generated right now is actually unstructured. Audio files, lecture files, emails, photos, and social media are all examples of unstructured data. These can be harder to analyze in their unstructured format. 

The fairness issue

The lack of structure makes unstructured data difficult to search, manage, and analyze. However, recent advancements in artificial intelligence and machine learning algorithms are beginning to change that. Data scientists’ new challenge is ensuring these tools are inclusive and unbiased. Otherwise, certain dataset elements will be more heavily weighted and/or represented than others. As you’re learning, an unfair dataset does not accurately represent the population, causing skewed outcomes, low accuracy levels, and unreliable analysis.

Post it on social media
Dr Nabeela: