What is Apriori Algorithm and how does it work?

Apriori algorithm is one of the easiest and simple machine learning algorithm.

It is one of the algorithm that follows ARM (Association Rule Mining). So before we start with Apriori Algorithm let us first learn about ARM.

ARM (Association Rule Mining):

Association rule mining helps discover relationships between independent relational databases and to find pattern hidden in the data.

Association rule mining is most suitable for working with non-numerical databases that requires more than numeric calculations and formulas.

Association rule mining is one of the ways to find patterns in data. It finds:

  • Patterns that occur together
  • Patterns that are related to one another.

For example,

{onions, potato } => {burger}

Let us assume this is the sales data of one the supermarkets which indicates if a customer buys onions and potato together he is more likely the burger as well.

It doesn’t mean that if a customer buys burger than the customer will also buy onions and potato.

If A tends to B that doesn’t mean that B tends to A.

These type of information might help in the product marketing and depending on the pricing might help in generating offers for the customers which will help in the increase in the sales.

This is basic function of ARM (Association Rule Mining) upon which Apriori functions.

Apriori Algorithm

Apriori algorithm is used for mining frequent item sets and relevant association rules. It is devised to operate on databases containing a lot of transactions, for instance, items brought by customers in a store.

It contains two steps:

  • Join
  • Prune

Now let us see how apriori algorithm works with an example. Suppose we have the following data that shows the number of transactions of each item set.

Here Transaction contains the number of transactions of the store and items denotes to particular items that was sold.

We have to find out the combination of items that was purchased together so we can have the proper idea of how to increase the sale of the items.

This can be done by calculating support and confidence for each item set so that we can find the frequent item set.

We are already provided with the minimum support count and confidence for comparison.

Let us assume that minimum support count is 50% and confidence is 70 %.

Now based on our data our next step will be to calculate the support of each item set. We can calculate it by

Support = no. of times each item is sold/total no. of transactions

Now that we have calculated the support count of each item we can know remove the items with support count of below 50 %. So the item that are left are:

We are left with items 1, 2, 3, 5 as they fit the criteria. To calculate further we need to form pairs of these items. After the pairing the result is going to be

Now we will be looking occurrence of the pairs of items in the database.

And will be calculating the support count as we did previously.

After calculating the support count the final pairs of items will be:

Every pair fits the criteria of the minimum support count. Now we will the earlier steps to form pairs again and to calculate the support for every pair again.

After the calculation of support the final pair of items are:

Here I did not include the item pair whose support was zero.

Only item {2, 3, 5} fulfill the criteria with support count of 2.

Now we will create the rules

After creating the ruled the pairs will something like this:

This simply tells that when item 5 was bought together with item 2&3.

Item 2 was bought together with items 3&5 and so on.

The reason we wrote the support of every item 2 times because the last item pair’s support was 2 and every rule is originated from that item pair.

 Item à {2, 3, 5}

Support à 2

Now, we will use confidence to turn these rules into association rules.

Confidence is calculated as:

Confidence = Support (A U B)/ Support (A)

Example:

Confidence of (2^3)à5 is

Confidence = Support ( (2^3) U 5 ) /Support (2^3)

                        =2/2

                        =100%

The value of pair {2, 3} and {2, 3, 5} is taken from the support of the pairs we calculated earlier. So just refer to the values there it has already been calculated we are just using it here directly.

As you can see the Confidence we calculated is above the threshold confidence that is 70 % so we will keep the pair and will discard any pair whose confidence is less than 70%.

I have already calculated the values and only two values full the criteria of threshold confidence. The final values are:

(2^3)à5

(3^5)à3

Like to Read:

You may also like...

1 Response

  1. Enoch Caba says:

    Really loved the words

Leave a Reply

%d bloggers like this: