Understanding YAML and JSON

Understanding YAML and JSON

A Beginner's Guide

Introduction

In today's world of data-driven applications, efficient and reliable data serialization and manipulation are critical skills for any developer. Two of the most popular data serialization formats are YAML and JSON. While they have many similarities, there are also some key differences between them that are important to understand. Additionally, knowing how to manipulate and extract data from these formats can be made easier with tools like JSON Path. In this article, we will explore YAML, JSON, and JSON Path in detail, and learn how to use them effectively in your own applications.

What is YAML?

YAML, short for "Yet Another Markup Language," is a popular configuration language used to represent data. The simplest form of data in YAML is a key-value pair, which is defined using a space followed by a colon to differentiate the key and the value. For example, to represent a person's name and age, you would write:

name: John Doe
age: 30

To represent an array, you can use the dash character to indicate each element. For instance, the following YAML represents a list of hobbies:

hobbies:
   - reading
   - painting
   - traveling

When dealing with a dictionary, you can group a set of properties under an item. In the following example, we have properties of each item under the object "address," and each property is indented with the same number of spaces to keep them aligned:

address:
   street: 123 Main St
   city: Anytown
   state: CA
   zip: '12345'

Note that the value for zip is in quotes, indicating that it should be treated as a string.

Spaces

In YAML, the formatting of key-value pairs is crucial, as it determines the structure and hierarchy of the data. For example, consider the following dictionary representing the meta-data of an address:

address:
   street: 123 Main St
   city: Anytown
   state: CA
   zip: '12345'

Notice that the blank spaces before these properties fall within the "address" object. If we were to add extra spaces before the "state" and "zip" properties, they would fall under the "city" object instead, resulting in a syntax error. This is because YAML does not allow for mapping values within other mapping values.

For more complex data structures, such as a "list containing dictionaries containing lists," the YAML syntax can become even more intricate. For example, the following YAML represents a list of fruits, with each fruit being a dictionary containing nutritional information:

address:
   street: 123 Main St
   city: Anytown
    state: CA
    zip: '12345'

When deciding whether to use a list or a dictionary in YAML, it ultimately depends on the data and how it should be structured. For instance, a single object with several properties can be stored in a dictionary. If the object has sub-properties, it can be represented as dictionaries within the main dictionary. On the other hand, if there are multiple items of the same type, such as car names, they can be stored in a list. Finally, if all the information about each car needs to be stored, a list of dictionaries can be used to represent all the information about multiple cars in a single YAML file.

Advanced YAML

Fruits:
    -  Banana:
              Calories: 105
              Fat: 0.4g
              Carbs: 27g
    -  Grape:
              Calories: 62
              Fat: 0.3g
              Carbs: 16g

In YAML, we can have complex data structures like lists containing dictionaries, and even dictionaries containing lists and other dictionaries. The example given above shows a list called "Fruits" which contains two elements - Banana and Grape. However, these elements are not simple values, but are themselves dictionaries containing information like calories, fat, and carbs. This allows us to represent nested and structured data in a more intuitive and readable way, making it easier for humans to understand and work with the data.

When to use list or dictionary?

The choice of data structure depends on the type of data being stored. For instance, let's consider the example of a car, which is a single object with properties such as color, model, transmission, and price. To store different information or properties of a single object, we use a dictionary in a key-value format.

However, the data structure need not be as simple as this. For example, if we need to split the model into model name and make year, we can represent this as a dictionary within another dictionary. In this case, the single value of the model is replaced with a small dictionary with two properties, name and year, resulting in a dictionary within another dictionary.

Suppose we want to store the names of six cars, formed by the color and model of the car. In this case, we would use a list or array since it is multiple items of the same type of object. Since we are only storing the names, we have a simple list of strings.

What if we want to store all the information about each car, including color, model, transmission, and price? In that case, we would modify the array from a list of strings to a list of dictionaries. This way, we expand each item in the array and replace the name with the dictionary we built earlier. Using a list of dictionaries, we can represent all information about multiple cars in a single YAML file

So, in summary, the difference between dictionaries, lists, and lists of dictionaries lies in the type of data being stored and the level of complexity of the data structure needed to represent that data.

Key-Notes

Dictionaries are collections that are unordered, whereas lists are ordered. For example, consider the two dictionaries below, which have the same properties for "banana." However, you can see that the order of the properties is different in each dictionary. In the first dictionary, "fat" is defined before "carbs," and in the second dictionary, "carbs" comes first followed by "fat." But the order of the properties doesn't matter, as long as the values of each property match. This is not the case for lists or arrays.

Arrays are ordered collections, so the order of items matters. The two lists shown below are not the same because "apple" and "banana" are in different positions. This is an important point to keep in mind while working with data structures. Additionally, it's important to remember that any line beginning with a hash symbol is automatically ignored and considered a comment.

Difference btw YAML and JSON

  • YAML and JSON can be used to represent data in similar ways. The difference lies in the way they organize the data.

  • YAML uses indentation to structure data into lists and dictionaries, while JSON uses braces or curly brackets. In YAML, a set of properties defined with the same indentation forms a dictionary. In JSON, a dictionary is everything enclosed within a pair of curly brackets.

  • To denote an item in a list, YAML uses a hyphen (-), while JSON uses square brackets to define a list. Each item within a list in JSON is separated by a comma

  • It's worth noting that while YAML and JSON share many similarities, they do have some differences in syntax and usage.

JSON PATH

  • A query language that enables parsing of data in JSON or YAML formats, like SQL in database software, is called a JSON Path. It allows you to retrieve subsets of data from a given JSON dataset. To retrieve details about a car, for example, use the query "car." If you want a specific field from within a dictionary, like the color of a car, you can use the query "car.color," where the dot notation selects a particular field within a dictionary.

  • Suppose the car and bus are enclosed in a dictionary named vehicles, and we have a parent dictionary called vehicles and child dictionaries car and bus, and then the properties color and price. In that case, we can use vehicles.car to get car details and vehicles.car.color to get the car's color. The dot annotation is used to extract properties of dictionaries and dictionaries of dictionaries in JSON data.

    However, suppose we try to use these queries now. In that case, it won't work. The JSON document's top-level dictionary, which has no name, is known as the root element of the JSON document, denoted by a dollar sign. Since these two dictionaries are not encapsulated by the vehicle dictionary, we remove the vehicle and replace it with $ sign to form a JSON path query. A query created for a JSON document with a dictionary at its root should start with a dollar sign.

    All results of a JSON path query are encapsulated within an array, even though they don't look like it. When it comes to lists or arrays, we have a list of different types of vehicles, and there are no curly brackets, meaning that there are no dictionaries, just a simple list. The root element in this JSON document is an array denoted by []. We use the square brackets in our query and specify the item we want inside it to get the first element, and it follows 0-indexing, meaning the 1st element is at the 0th index, and so on. Always remember to start with a $ symbol for the root element, so to get the 1st element of the list, we write "$[0]," and if we want the 1st and 4th elements, we can write like this "$[0,3]."

  • Now let us look at lists or arrays, here we have a list of different types of vehicles, and you can see that there are no curly brackets, so there are no dictionaries, just a simple list. the root element in this json document is an array denoted by []. how do we get the first element? use the square brackets in your query, and specify the item you want inside it. and it follows 0-indexing means 1st element is at 0th index, 2nd is at 1st and so on. and always remember to start with a $ symbol for the root element. so to get the 1st element of the list, i say “$[0]”. if i want 1th and 4th element, then i can write like this “$[0,3]”.

  • When it comes to dictionaries and lists, we have a car's data, its properties - color, price, and wheels. The wheels are a list with four items in it, each one being a dictionary. If we want to develop a query to retrieve the model of the second wheel of the car, we start with the dollar symbol for the root element. The root element of the object is a dictionary denoted by the curly braces, so we know that our query needs to start with a dot following the & symbol. The dot is for a dictionary. Within the dictionary, we have car, and then the wheels, so if we write "$.car.wheels," the query would return an array of all the wheels. However, to get the second wheel, we'll do "$.car.wheels[1]." Notice that we did not use a DOT here as wheels are not a dictionary; it is an array. Now we have the second wheel details, to get its model, we can add "." to the query "$.car.wheels[1].model."

  • Now let's look at applying conditions or criteria to our query. We need them to filter items in a list according to specific criteria. For example, if we have a bunch of numbers and we want to list the numbers that are greater than 40, we start our query with $ for the root element. Since the root element is an array, we use square brackets. We specify the criteria within the square brackets, like "Check if each item in the array > 40

  • referring to earlier example, let’s say we want to find the rear wheel in the object CAR, so to write a query to work with data entered in any order we can use a criteria to identify the wheel which has property location set to rear right. so instead of hard-coding the position of wheel, we replace it with a criteria like this → “$.car.wheels[?(@.location==”rear-right”)].model”.

Resources

KodeKloud