Dataset

TIOBE Software: Tiobe Index

Added By mrflip

The TIOBE Programming Community index gives an indication of the popularity of programming languages. The index is updated once a month. The ratings are based on the number of skilled engineers world-wide, courses and third party vendors. The popular search engines Google, MSN, Yahoo!, Wikipedia and YouTube are used to calculate the ratings. Observe that the TIOBE index is not about the best programming language or the language in which most lines of code have been written.


TIOBE Programming Community Index Definition

Since there are many questions about the way the TIOBE index is assembled, a special page is devoted to its definition.
Ratings

The ratings are calculated by counting hits of the most popular search engines. The search query that is used is

+" programming"

The search query is executed for the regular Google, Google Blogs, MSN, Yahoo!, Wikipedia and YouTube web search for the last 12 months. The web site Alexa.com has been used to determine the most popular search engines.

The number of hits determine the ratings of a language. The counted hits are normalized for each search engine for the first 50 languages. In other words, the first 50 languages together have a score of 100%. Let’s define “hits50(SE)” as the sum of the number of hits for the first 50 languages for search engine SE and “hits(PL,SE)” as the number of hits for programming language PL for search engine SE. Possible false positives for a query are already filtered out in the definition of “hits(PL,SE)”. This is done by using a manually determined confidence factor per query. A query such as “Basic programming” also returns pages that contain “Improve your basic programming skills in Java”. The first 100 pages per search engine are checked for possible false positives and this is used to define the confidence factor. If this factor is 90%, then only 90% of the hits are used for “hits(PL,SE)”. An overview of the confidence factor can be found in the groupings table below.

The ratings are calculated with the following formula:

((hits(PL,SE1)/hits50(SE1) + … + hits(PL,SEn)/hits50(SEn))/n

where n is the number of search engines used. YouTube only counts for 7%, the other search engines 23% for each.
Status

Besides the rating of programming languages, there is also a status indicated in the TIOBE chart. Programming languages that have status “A” are considered to be mainstream languages. Status “A-” and “A—” indicate that a programming language is between status “A” and “B”. If a programming language has a rating that is higher than 0.7% (yes, this number is arguable but we had to fix it somewhere) for at least 3 months it is rewarded status “A”. The first two months the programming language will receive status “A—” and “A-” respectively. The opposite holds for languages that go from status “A” to status “B”. So if a language had status “A” 2 months ago, a rating of “0.607%” last month and a rating of “0.687%” now, it will have status “A—”.

From a supportability point of view, it is strongly advised to stick to mainstream languages for industrial, mission-critical software systems. This is for three reasons:

  • The pool of skilled engineers is much smaller for non-mainstream languages
  • Tool vendors do not write and maintain tools for non-mainstream languages
  • In general fewer libraries are available for non-mainstream languages

It is important to note that this is only one of many criteria to be used before taking a decision to adopt a language. Other criteria are: suitability for the application domain, reliability of compilers, expression power, performance, and scalability. Hence, Ada can still be used for mission-critical systems although one should consider alternatives. This is what you also see in daily practice: Ada is hardly used for new mission-critical systems anymore. The other way around is also true. Everybody will agree that it is not wise to program missile software in JavaScript.
Groupings and Exceptions

Programming languages that are very similar are grouped together. Currently the maximum of the hits of the individual languages is taken into account when calculating the ratings of groupings. In the future we will do a better job and take the union (from mathematical set theory) of all the hits.