


Video Games
What Data-Driven Business Decisions Can We Do
When We look at Sales of Video Games Around The World?
Posted by Maya Sandler on December 22, 2020
Intent
Building a video game is a complicated task, as it needs to be appealing to the market and enjoyable for consumers, to bring in enough revenue for the company and to support growth. Video games continuously push the limits of graphics, processing power, and complexity, and therefore cost a lot of money to produce. To create a good gaming product, it needs analytics in several levels, in addition to product analytics: (1) The game developers level, who needs to know their users and their culture and preferences; (2) The publishers level, who put the game into the right market and sometimes in several platforms to reach more users.
​
In this project I asked several business questions ​about video games sales in the world. My approach is to always think about the customer's needs and what is appealing to him. I was interesting in examining how the user interest-scope is changing in different regions around the world. Coming from a different country to the US, I know that location and society really affect your preferences of games and even TV shows, and I was interested to see if there is a global effect in the world.
​
I was especially interested in answering these questions:
-
What are the most popular games of all times in the world?
-
What are the most popular genres in the world and in the different regions in the world?
-
What publishers have a high ability to get a game to market? Is there a difference between the different regions in the world?
-
What are the strongest regions in the market? How do their consumer behavior changes throughout the years?
-
What video game platforms are most common, globally and regionally?
These questions and the insights of this analysis have major effect on decision making in the gaming industry, as they reflect the user interests and habits and therefore can reflect on their continuing desire for new products in their area of interest.
​
I download the data from Kaggle, cleaned, transform, and imported the data via a Python code, validate and analyzed the data via MySQL and visualized it using Tableau Public. I conclude the project with my conclusions and insights from my analysis.
Description of the Data
Dataset (.csv) was downloaded from Kaggle and contained data of all the video game sales that sold more than 100,000 copies between the years 1980-2017. I used an already scraped dataset that was generated by a scrape of vgchartz.com website. (If you are interested in the script to scrape the data, it is available at this link). The dataset contains 16,598 records.
The database contains the following data:
-
Rank - ranking of overall sales.
-
Name - name of the games.
-
Platform - platform of the games (i.e. PC, PS4, etc.).
-
Release year - year that the game was released.
-
Genre - genre of the games (sports, action, etc.).
-
Publisher - publisher of the game (i.e. Nintendo, Sega, etc.).
-
NA sales - sales in North America (in millions of dollars).
-
EU sales - sales in Europe (in millions of dollars).
-
JP sales - sales in Japan (in millions of dollars).
-
Other sales - sales in the rest of the world (in millions of dollars).
-
Global sales - a total worldwide sales (in millions of dollars).
Preprocessing
I chose to write a Python code to read the data from a raw .csv file and loading it as an SQL database, and also to clean and transform the data to a consistent format in the process.
​
The code was based on a code published by Md Ansari linked here, and changed to allow for my needs, as described below. In a previous project (FEMA Disaster Declarations) I wrote a different code that made me actively copying and pasting the Python execution output to an SQL workbench, write some more lines in MySQL and execute it there to finally create a database. The new method I'm using in this project is based on Pandas, mysql.connector and SQLAlchemy is a short and easy way to:
-
Read the data to a temporary location
-
Manipulate the data (clean, change, and transform it) into the final ready-to-be-analyzed form.
-
Create a database, tables and columns in SQL.
-
Inject the pandas data frame into the tables on the SQL database.
And to achieve it all at once with one execution (of course while checking it each step on the way to see that it works correctly, and debugging when needed).
​
I will explain below what I did as well as attach the code for each step.
Import the Needed Modules to Python
This code uses (1) Pandas to read and manipulate comma-separated values file; (2) mysql.connector to connect my objects to MySQL server; (3) math to manipulate the data; and (4) SQLAlchemy to create the engine to maintains a connection available for talking to the database.
Read the Data From a .csv File Via a Python Code
The complete .csv is read to a temporary place.
Connecting to MySQL and Creating a Database
Use mysql.connector to connect to the MySQL on my computer, and within MySQL, creating a database. The code will notify me if there is an error connecting to MySQL:
Validate, Clean and Transform the Data
Validation that all the data was in proper formats and that it is consistent (years, publishers, etc.) via Microsoft Excel (using reorder and find). This process led to finding several rows with missing information, like the release year and publisher, and rows with a general notion of publisher name (“Unknown”). I transformed the data as part of the Python code.
​
Missing Information:
Missing information appeared as "N/A" in the original .csv file. This created a problem while executing the code in python because it runs pandas and it doesn't recognize what "N/A" is and sets it as a float data type. To achieve my purpose of transferring the original "N/A" to MySQL's NULL, I needed to add to the code another step of transforming the original N/A to None in pandas data frame, so it will automatically be transformed to NULL in MySQL.
​
Consolidate empty and "Unknown" publisher names:
Instead of deleting and losing data, I decided to gather the original "N/A" publisher name and the publishers written as “Unknown” into a single type of publisher name called "Other". Logically, it is basically the same meaning. This way all of them are consolidated as an "Other" publisher name, and can be counted in analysis. Count of the lines that were changed to others showed 261 games in the "Other" column.
​
This section of the code was between reading the .csv file and injecting the transformed data to MySQL:
To insert the data to a table, I connect to MySQL again, create a table with columns with the correct data types, and write a code in pandas to insert the data into the table. Here as well, the code will notify if there is an error connecting to MySQL:
Insert the Data into a Table in MySQL:
Importing the Processed Data to MySQL
To import the transformed data into MySQL, I created an SQLAlchemy engine and used .to_sql pandas method to inject the data into the database in MySQL.
Chunksize is 500 to allow for a big size dataset to be injected in chunks.
Thant's it, a code that does everything in one execution. I like it!
Now to MySQL for the analysis and fetching the interesting information.
Validate the Data in MySQL Dataset:
After transforming the data, I needed to verify that the changes were made and that the data remained intact. I ran several queries to do it:
1. Check that all rows were transferred:
I compared the amount of rows in the raw data to the amount of rows in the dataset table, using the below query in MySQL. Both were equal to 16598, so all the lines are here. Great!
2. Checking for "N/A", and NULL values:
In the transformation of the data I changed the "N/A" data to None that was supposed to transform to NULL during the insertion of the data to MySQL. Therefore, it was needed to be checked. I checked it in the release_year column, as the N/A originally appeared in this column.
Counting "N/A" retreived 0, and counting of NULL retreived 271. That means all N/A in the release year column turned into NULL values. Great!
3. Checking for NULL values in the publisher column:
The publisher column was also transform in the Python script. It was changed from the original N/A to pandas None to "other". This transformation needed to be checked as well:
Counting retrieved 0 "N/A" and 261 "Other", which is exactly the number counted in the python script. Super Great!
​
Now that the data is clean and transformed, We could start to analyze it in MySQL.
Results:
All queries were done via MySQL and visualized via Tableau.
The 50 Most Popular Games of All Times in the World:
To find our what is the most popular games in the world, I used the below query. It returns information about the 50 highest selling games in the world in descending order, and are colored according to their genre.
Note: The data showed that many games came out in several platforms, therefore in this analysis, I merged the games by name and grouped the platforms.

These results indicate that the best selling game is Wii Sport, and it sales are double then the next best selling game. I see a lot of blue (sport genre) and light blue (platform genres) as the highest selling games, but it need more analysis to be certain what are the best selling genres, and I'll get to that later.
​
These results seamed biased on the fact that some games are selling for longer time than others, so the next analysis I've some was to normalize the sales according to years since they have been released. This was achieved by the following query.

The graph shows the biggest selling games which brings more money to the company per year (the most profitable). The number above each bar is the number of years the game is in the market. Wii Sport is still the star of the sales, profiting more than double per year since its release. Next best sellers are Mario Kart Wii and Wii Sport Resort.
Again, the colors indicate a preference toward sport. We will dig into that deeper.
The Most Popular Genres:
People have a preference to specific genres, and gaming companies are aiming toward those preferences to increase their sales. Therefore, I looked into the most profitable genres in the world (called in the code "global_sales") via the following query.

Knowing that region and culture have a significant effect on people interests (games are totally part of a person's culture), I also looked to see if different regions in the world have different genres they are looking for (query similar to global query), and found some interesting insights:

These results show several interesting insights:
-
Sport genre is the 1st selling genre in all the world, except Japan, which are unlike all others is interested mostly in Role-Playing genre. This is an important insight which indicates a cultural difference, that is important when aiming to the Japanese user.
-
Platform genre is 2nd in Japan and North America, and therefore effect global sales tremendously. Working on a Platform genre game mean facing two different personas, culturally wise. This is an important aspect to consider when creating the game.
-
Racing genre is 3rd in the world preferences, and is affected mostly by Europe and other countries (not North America or Japan). This is important to know your users and to have a good sense of what they are looking for. What are they missing and this game can help in giving it to them.
-
Shooter genre is one of the selling genres in the world and is almost solely comprises of the North American market, as it is the 3rd biggest genres that is consumed in North America. Therefore, when creating Shooter genre games , the companies appeal to the North American consumer.
-
Puzzle genre is sold in high amounts to the North American and Japanese consumers. This is very important when building the pictures and images of the puzzles, as each cloture is going for specific images of interest.
-
Action genre is mostly sold to Europe and other countries. It would be interesting to further analyze the data to see the different games that appeal to both markets, but we will not go deeper in this project.
​
This analysis is very important for building specific personas of users in specific games to increase appealing to the right market and to increase revenue.
What Publishers Have a High Ability to Get the Game to Market?
The game developers need the publishers to put their game into the market. But how will they know if they are good? To get some insights on this questions I asked 2 questions: (1) What are the most profitable publishers in the world? (2) Are there specific publisher that are more appropriate for selling to a specific regional market?
​
To answer these questions, I analyzed the sales according to publishers to look at their ability to sell the games globally using the following query, as well as in specific regions in the world (using similar queries).
Best Global Publishers:
This data shows that Nintendo has the biggest selling abilities, more than 1.5 times than the next best selling publisher, Electronic Arts. Thereafter, sales drop and the next best publishers are Activision, Sony Computer Entertainment, etc. See next image for visualization.

Best Regional Publishers:
Depending on the gaming developer's consumer, they might want to go to a publisher who is more affective in reaching the specific regional market.
​
This data shows that Nintendo, Electronic Arts, Activision and Sony Computer Entertainment have the biggest selling abilities in North America, as well as Europe.
​


In Japan however, the best publishers are different. Nintendo is by far the strongest player and after it, with a drop of 3.5 fold are Namco Bandai Games, Konami Digital Entertainment and Sony Computer Entertainment. Therefore, when producing a Role-Playing genre, which is the #1 seller in Japan, game developers should approach Nintendo first and then to these publishers to bring their product to market.

The Strongest Market?
When creating a video game, as a game developer you may want to direct the game to the strongest market, rather than appealing to a niche audience, to increase revenue. Money and interests are changeable along the years, and therefore the market can change. Therefore, I asked what is the purchasing power of different regions in the world along the years.

I visualized the data so I will be able to see the different regions and their market along the years, and to be able to filter in/out specific publishers.
* Note that the drop of 2017 (the last year of records in the dataset) is not real data, as the data has only 3 points (games released), when in fact there have been many. Therefore, 2017 was taken out of the analysis.
​
This analysis has several insights regarding the stronger markets:
-
There is a strong growth (hockey stick) in the gaming industry since 1995. I explored online at some articles, and found that this jump in growth may be related to (1) the transition to 3D games (since 1990's), (2) the increase in PC games (since 1990's), and (3) 5th generation consoles (since 1993), which made the gaming industry a lot more reachable, interesting and real. There has also been a drop in sales of video games since 2012, and this .
-
North America had always been one of the biggest influencers on the video game industry.
-
Europe, which had a small purchasing power up until 1995, became a big market since the dissolution of the Soviet Union (that ended in 1991), and became the second biggest contributor to the gaming industry.
-
Japan influenced the economy of gaming from 1980's, but their purchasing power has been declining in the past 20 years. Since then, Japan became a niche market.
-
The other regions in the world increased in their purchasing power since the year 2000, and reached Japan, as the 3rd and 4th influencers on the gaming industry. However, since the "other regions" are a collection of many cultures, it is difficult to direct the games toward specific personas and a specific market analysis should follow the individual countries within the "other regions".
-
Interesting insights about the history and business strategy of the companies can also be pulled out of this data by filtering specific companies.
-
A good example is Atari, that created and published video games and then stopped in the early 80’s. Atari changed its business strategy and started to solely publish games in the 1999 - 2011, and sold almost uniquely in North America and Europe (see the following image).
-
Another good example is Nintendo, the biggest publisher in the world, which has fluctuations in sales every 2-5 years and sales in North America, Europe and Japan. Nintendo had a tremendous increase period in sales in the 2005-2010, and then came back their baseline. When they are high, they are very high.
-
Electronic Arts, being the 2nd best publisher in the world is also very interesting, since the company only started publishing video games since 1992, and quickly became a strong player, mainly in North America and Europe. Their regional sales indicate that in recent years they became bigger in Europe than in North America. Therefore, If a gaming developer company desires to create a game directed to the European consumers, Electronic Arts is a very good option, even better than Nintendo in recent years.
-



Preferable Platform for Video Games
When developing a game, you want it to reach a lot of people, therefore, it is important to chose the platform wisely. Many times, a good game will be published in multiple platforms to reach more possible users.
​
The query is intended to look into the differences in platform preferences between the regions in the world, as well as a global view.
Regional Preferable Platforms:
Examination of the strong platform in the world according to region, indicates that Wii is the strongest in the two most influencing regions on the gaming sales - North America and Europe. However, it is not the case in other regions. Also, there is a significant difference in the other platforms in each region. Most likely that users will not switch to a different platform once they bought one, unless it is old and they want to buy a new device (and they can afford it).
​
Therefore, this data is extremely important when producing a game to consumers, as it affects the question if the video game will be available for the users, and the whole interaction of the user with the hardware.

Conclusions
In this project I asked several business questions regarding the market and the selling abilities of specific games, genres, publishers and platforms between 1980 and 2017.
The analysis indicated that there are two specific strong markets (North America and Europe) with specific preferences of genres and platforms to publish the games in. ​Japan was revealed as a different market zone with unique preferences, and a smaller market. The data also indicated that specific regional market strengths for specific publishers, and a trend in their publishing abilities.
​
The insights in this project regarding video game sales, publishers, platforms and genres are important for video game developers who are seeking for the optimal affect, as well as video game publishers who are reaching for specific markets, and even for us as consumers.