The App Store continues its rapid growth, with approximately 300,000 apps added each year. For a bit of fun I decided to download as much app metadata as I could in order to find patterns within the App Store data. I found that 75% of apps are free, 60% have no ratings, the Entertainment category has the worst user ratings, developers employ psychological pricing techniques, there is a correlation between price and rating, and a whole lot more ...

Knowledge of the statistics and trends present within App Store data is a valuable tool for both app developers and iOS product developers, like our very own ShinobiControls team.

Downloading The App Store Data

Apple actually make it quite easy for you to access app metadata by exposing an API for searching the App Store. You can use queries of the following form to search for app listings:

The query returns data in JSON format including details such as the supported devices, price, rating, category, release date, app size (in bytes) and more. You can find documentation for this API on the Apple Affiliate Resource pages.

This search API is great if you know what you are looking for, but for this data-gathering project I just wanted to download anything and everything! Unfortunately the App Store doesn't expose an API for navigation and as a result there is no way to obtain a list of apps per category.

To solve this problem I devised a simple method that was a combination of brute force and monte carlo sampling. I wrote a simple node app that searched for random three letter string. The search results were inspected for new apps, and any new data was stored in a simple file-database. This app ran at a rate of one query every few seconds and leaving it running overnight gave me a dataset of 75,000 apps!

Time to analyse the data …

Category Distribution

Once I had obtained my database of app metadata, I wrote some simple JavaScript data analysis routines which output their results in JSON format. I used D3.js (a tool I have been meaning to try out for ages) to visualise these results.

One of the simplest analyses was to look at the distribution of apps within the various categories:

Nothing terribly surprising there, Games is the biggest category accounting for 16% of all of the apps within the App Store, reflecting not just the popularity of iOS Games, but the somewhat broad nature of this category. Weather is the smallest category, no doubt reflecting the restrictive nature of this category.

Ratings Distribution

For any app developer the ratings their users provide via the store are of critical importance. They are the primary mechanism for receiving user feedback, and an app's rating, whether good or bad, will play a critical part in determining whether a user will download an app when it appears in search listings.

I ran a simple query on my data to determine the distribtion of user ratings. I found that 60% of apps do not have any user ratings, and of the remaining 40%, I found the following ratings distribution:

It is good to know that very few apps have an average rating of 1.0, and that 4.5 is the mode (i.e. most frequent), pointing to an reasonably satisfied bunch of iOS users. However, with the large number of apps in the Games category, I'd expect the ratings distribution from that category to dominate the overall distribution.

Ratings By Category

I decided to repeat the ratings distribution measurement, but this time creating the histogram on a per-category basis. The following shows the resulting distribution for each category, ordered by the overall average rating for each category:

The ratings distributions are quite different between the various categories, with apps in the Entertainment category having the poorest user ratings, with a mode of just 3.0.


The metadata for each app includes its filesize, in bytes, which allowed me to generate a filesize frequency distribution shown below: One interesting feature of this chart is the small 'bump' at around 50MBytes. Apps that are larger than 50MBytes in size cannot be downloaded using a cellular network connection, they must instead be downloaded via WiFi. This limits the user's ability to install apps on impulse, and as a result if an app is close to this limit, developers will try to reduce the size of assets to bring the app size in under 50MBytes.

Price Distribution

Another very sensitive factor in determining the success of an app is its price, with the App Store allowing developers to either give their creations away for free, or sell it at anything between $0.99 to $999.99 dollars.

The following pie chart shows the price frequency distribution of the App Store:

The above chart indicates that over 75% of all the apps in the App Store are free. Looking more closely at the price distribution of the paid-for apps, you can see that the prices are clustered around multiples of five-dollars. Clearly a case of app developers trying to take advantage of psychological pricing! Another interesting statistic is the average price by category:

There is a big difference between the average price for each category with Business apps costing six times more than Games on average. Most of the Categories that have a lower average price are those which we associated with casual use, apps which are used for fun and pleasure. Whereas the high cost categories contain apps that are intended to deliver valuable services, where you might expect to get a return on your $12 investment.

One final price-related statistic is the correlation between price and user ratings:

The above chart shows a quite strong positive correlation between price and ratings, with more costly apps tending to have a higher rating. But be warned that correlation does not imply causation, putting a high price tag on your app will not necessarily improve your ratings!

Hopefully you have enjoyed this little exploration of the App Store metadata. If you have any other ideas for potential charts, analyses or correlations to explore, let me know and I'll try them out. I've also been busy downloading even more App Store data ... more of this to follow shortly!

Regards, Colin E.