How to deal with sparse data in Machine Learning?

Sparse data means incomplete or lack of input data or data with missing values,  on which we train machine learning models to predict.

On other hands, Data Density is exactly the opposite situation, where you do not have missing data.

The following data shows data sparsity and data density :

Name Age

Income ($)


27 NA#(0)













Sheila NA#(0)



In the above table we can notice NAs (Not Available, represented as 0) . The above table as 6 X 3 sparse matrix of 18 elements ( with 6 rows and 3 columns excluding column labels), has 5 elements with 0 as value. So, we can say that the above input data have 28% – data sparsity and 72%-data density.

Data sparsity is the real time scenario, one would come across to deal with it. It is normal in customer-friendly surveys with extremely sensitive to personal information gathering process.

How do we scale the sparse data?

Scaling data is another significant preprocessing task to be carried on your sparse data. In SKlearn, sklearn.preprocessing has one class – MaxAbsScaler and a function – maxabs_scale

For our example let us use one simple array with Sparse data.


from sklearn.preprocessing import MaxAbsScaler

import numpy as np

sparse_matrix = np.array([[ 25, 28,  0],

[ 47,  0, 30],

[  0,  50, 80]])

scaler = MaxAbsScaler().fit(sparse_matrix)



array([[ 0.53191489,  0.56      ,  0.        ],

[ 1.        ,  0.        ,  0.375     ],

[ 0.        ,  1.        ,  1.        ]])

Note : Above code compiled and executed in spider GUI with Python 3.0.

MaxAbsScaler maps each element to its Absolute value between [0,1], and it does on positive values only and disregards all 0 values. This class does not take care of outliers is another drawback.


Google Ad Words replaced Sidebar Ads with Product Listing Ads

Google Ad Words replaced sidebar ads with product listing ads

I happened to notice Google Ad Words replaced Sidebar Ads with Product Listing Ads. Now you can see just 4 ads on top and 3 to 4 ads below the Search Engine Result Pages (SERP).

This is how I got an SERP for “T Shirt Printing in Bangalore” keyword.

Notice just 4 ads on the top

Adwords 4 Ads at Top

3 Ads on downside of Search Engine Result Page

Adwords Bottom Ads

What about side bar ads?

Side bar ads have been replaced by Product Listing Ads.

Side Bar Ads Replace with Product Listing Ads

Why the change ?
1.Google might have noticed that, people are not reaching out textual or search ads appearing on side bar. Consequently, side bar ads are poor candidates for contributing good CTR, hence resulting in poor conversion rate.
2.Google wants us to associate tiny images along with textual or search ads, like we do in Face book ads. It is possible through Product Listing Ads.
3. Google is convincing us to make use of Product Listing ad technology for enhancing, user search experience. I like this campaign type because, it is my chance to bring down my virtual store on to Search Engine Result Pages with appealing product images.
What google says ?
There is no official confirmation from Google about this new format of Adwords ad display. But there is buzz around the internet . Check out SEMPOST latest article for further reading.

Adobe Site catalyst first party and third party cookies. What to choose?

Cookies are fundamental building blocks of web analytics tools. It is small chunk of software code placed into visitor browser to track the some vital information about visitor.

But, with regard to web analytics cookies in general and site catalyst or google analytics cookies in particular usage of cookies is governed by Personally Identifiable Information ( PII) governed by “European ePrivacy Directive

Cookies can be classified as First Party and Third Party Cookies.

First party cookies are domain specific cookies placed in visitor browser by web analytics vendor on behalf of customer. Upon placing first party cookies, visitor information collected by Adobe site catalyst will not share with any other domain or party.

As a first part cookie, it collects all data anonymously with no reference to any personal data what so ever.

First Party cookies set by site catalyst Reporting and Analytics recognizes visitors who traverse across sub domain and Top Level Domains. For example first party cookies set on also recognizes the visitors who traverse across music. or music.

First Party cookies are :

  1. s_cc : This is cookie checks whether browser has enabled to accept the cookies are not. The default value is true , i.e. cookies are enabled, if not it is ‘false’
  2. s_sq : This cookie is to keep the click map data from previous page.

Third Party Cookies

Adobe also uses third party services like and to track and collect visitors data. These cookies are third party cookies where in data collected by these 3rd parties may share data[there is no evidence] with other other domains, to target user on other domains for Remarketing or Re Targeting purpose. On other hand majority of browsers have sophisticated filters to reject third party cookies due to security concerns. To surprises , even first party cookies can also be rejected. This threatens the accuracy of web analytics data being collected by enterprise level web analytics tools.

s_vi[##] is third party unique visitor identification cookies placed by if you had chosen to go with third party cookies. On other hand, you can opt for first part cookie, if you could choose to work with first party cookies.

Last but important,Site catalyst cookies usage is governed by “European Union ePrivacy Directive”, learn from the leader. Happy visit to Adobe Digital Marketing Blog for more information about European Union ePrivacy Directive

Adobe Omniture VISTA rules : Server side Implementation for Real Time Data Manipulation

Visitor Identification,Segmentation & Transformation Architecture

What are VISTA rules?
VISTA stands for Visitor Identification, Segmentation & Transformation Architecture. It is Adobe Site Catalyst technology to implement rule or apply logic on server side, after collecting data on website, but before sending processed data to Adobe Analytics Server.

VISTA rules are realtime data manipulation system, which resides on serverside, where in any traffic variables and custom variable can be created or deleted. There is no limit to create VISTA rule on each Report suite.

VISTA rules can act on data sent in HTTP header and data sent through Report and Analytics code.

Who Implements VISTA rules ?
Adobe Engineering Team plays a major role in implementing VISTA rules. Clients need to identify and share requirements of VISTA rule with Adobe Engineering Team. The team hard code the requirements, test rules and after getting desired results, the scripts are pushed to production server. But that does not come for free, you need to pay stipulated fee for that.

VISTA Rule Implementation Use Case
Imagine an insurance company has Direct Sales Associates(DSA) who are appointed across all the states in India. An online enquiry form is placed to acquire the leads through website. In the Enquiry Form there is a drop down menu which has all DSA names populated along with Insurance company’s state specific sales team.

When a prospect fills web enquiry form, the sales would be attributed to specific DSA or sales team based on the option chosen by the visitor.

For above requirement the developer team of insurance company has to transport the data to server, where in Adobe team sets up VISTA Rule to segment the data based on the option chosen by the prospect.

What are the other situations, where VISTA Rules can be implemented?

  1. To filter out internal traffic to company website.
  2. Real time visitor segmentation based on demographic factors like age and geographical they belong to. Visitors can be segmented on any possible factor we capture through form text fields. These form text fields are sent either through site catalyst variable or query parameter which are appended to URL.
  3. Capture the order values along with product line to target the customers, to upsell or cross sell.