Geographical Heatmap of Programming Languages
Of course, the popularity of each of the programming languages is not evenly distributed across the globe. Here in this project we are going to find out, visually.
Github, a favourite starting point
Github is the largest sourcecode inventory containing tons of projects built with a vast number of languages. Here it is a very good point to start the exploration.
- Download Github project metadata
- Parse the project owner's location from profile (if set)
- Plot the geographical distribution on GoogleMap, weighted by code size.
The code of the project is available on Github.
Downloading Github Projects Metadata
Github API v3 is a quick entry point to access tons of repositories without downloading the whole codebase. Passing in the repository ID to obtain metadata of a project which contains following information I need:
- Fraction of languages, measured by LoC.
- Owner information
And with the same API, I can fetch the owner metadata which also includes
location as visible from their profile. Some are valid but some are not.
To plot the location on exact spots of the earth we need to know the geolocation. To find out, I use Geolocation API of GoogleMap to convert from location string into latitude and longitude.
To avoid reprocessing which exceeds the quota of the API usage, I filter out some garbage locations and make them a distinct list first.
Google Heatmap API comes into play. Since the locations parsed from the previous step can be highly discrete and the intensity is a key.
Here the metadata of 117,000 repositories are ready to plot. See by yourself how the distribution of each of the programming languages look on our planet.