
So that’s at least good in the sense that we don’t have outliers representing a data quality issue.

Units can’t take more than 100 in damage nor can they take negative damage. Also, it looks clear that we have some natural boundary conditions. If you’ve played Civ VI, you may already have a guess as to why. Finally, negative modifiers surely exist because units can be damaged (resulting in a negative modifier) and terrain can also harm a unit’s overall strength.įinally, in the third column, for the damage that the attacker took (AttackerDmg), there is a spike at 0. If you’ve played the game then you know that in game bonuses to units come in those varieties.

You may also notice, by the eyeball metric, that there are also modification spikes around 5, 7, 10. Or at least that’s what it is implying about this game. That can make sense since it’s likely the case that unit vs unit combat probably happens most against unmodified units. Next, for the attacker and defender strength modifiers, it looks like there are spikes at zero. Though it looks like some civ or civs have units with strength over 100! Given that this game is still only half way through and I’m playing on Price and am not a rockstar Civ player, unit strengths haven’t gotten too large. Notice that the “ID” columns are missing from describe.AttackingCiv, DefendingCiv, AttackerObjType, DefenderObjType, Attacker Type, Defender Type, all appear to be categorical represented as numbers. Identifying which columns are actually numeric and which are categorical is helpful as well.Whereas “Attacker Type” and “Defender Type” are spread out. For example, AttackerObjType and DefenderObjType look to be in a tight range. Knowing the range (max – min) is useful as well.Is the standard deviation for any particular column zero? If it is, then you have a constant value.The basic and default statistics that are computed are helpful in a number of ways.Make sure that the count for each column is the same (1193 in this case).For larger datasets, I would encourage that you write a wrapper around describe. This is a small enough data set so I can spot check this with my eyes. You can read this as a standalone article here.

Some people have reported some formatting issues with the rest of the article. Finally, if you’re a teacher, I hope you will use this in your classroom! That’s the real point of this blog! If you are a business manager / executive and you’re trying to understand how your data science teams do their work, take this analysis as an example of what the process looks like on a sample data set. If you are well-versed in the methods, I hope you’ll give your thoughts on how you might have approached this. If you are new to data science / data analytics, I hope you’ll find value in this. In this article, we’ll use the combat log file that the game saves to learn a thing or two about modeling and data analysis.
E XS MAX CIVILIZATION V SERIES
Sid Meier’s Civilization VI is the sixth installment in the series and it features, as it has in previous iterations, a combat element between military (and non-military) units. Many games that have a combat element have some type of rationale for determining damage dealt to the opponent.
