Handicrafts electric business platform Etsy personalized recommendation

By Lewis Cole,2015-02-19 14:09
39 views 0
Handicrafts electric business platform Etsy personalized recommendation

    Handicrafts electric business platform Etsy personalized


    Provide personalized recommendation is very important for online shopping

    market.Personalized recommendation is beneficial to both the sellers and the buyers, buyers don't have to search yourself can get directly to the information they are interested in products, the seller can get better at the smaller marketing product visibility.In this article, we will look at us in Esty (America online store platform, arts and crafts products trading as the main characteristic, the use of some of the recommendations.All of these methods of graphs are included in our open source machine learning package "[Conjecture] ", this package in [The previous articleDescribed in existing].

    The recommended calculation process basically consists of two stages.The first stage, we based on historical data matrix to establish a user interest model.For example, listing their purchasing records or their collections (not familiar with the students of linear algebra and matrix see [This article]).This model provides the user and the eigenvectors of the item, their inner product shows a user to a certain degree of items of interest (a high value on behalf of a larger estimate the degree of interest).The second phase, we will calculate the recommended as a result, by maximizing valuation interest degree, get a set of items for each user.

    About the model of users and items can also be used in other ways, for example is used to find users with the same interests and hobbies, look from the perspective of a "taste" has the similarity of items, as well as the complementary items can be purchased together.

    Matrix decomposition

    Have recommended the results of the first phase is to model and the data of users and items.In etsy, we deal with the implicit feedback data, only observe the user interactions with the objects (such as a collection or to buy).This grade and users to experience items (for example) a three-point play under 5 point as the "feedback" is just the opposite.We use a binary matrix represents the implicit feedback, each element has a value of 0 or 1, 1 like (such as collection) this item on behalf of the user, 0 on behalf of the user does not have to do so.0 is not necessarily said users are not interested in this item, but it only means that they on the item has not yet shown interest.This may be because they do not care or are not interested in, or is due to the user when browsing also didn't see this item.

    Implicit feedback data set, a group of users interested in all kinds of goods.Please note that we do not collect explicitly don't like behavior, only collect user collection or not.

    Matrix decomposition model can effective support the hypothesis is: in the correlation between users and items, can pass a low-dimensional linear model to explain.This means that each item and the user really real vector corresponding to a not observation, in some small dimensions.Space potential characteristics of the coordinates of the corresponding item (can be: if the item is clothing, whether it has a v-shaped logo, whether the background of the picture is brown, etc.), user vector element describes the user preference for these characteristics.We can put these vector matrix, one for the user, one for the goods, so the observed data is by taking two position matrix and add noise to produce.

    Low dimensional model from observed implicit feedback, "d" is the dimension of the model.

    As a result, we found a for each user and each item of the vector can represent them.We calculate these vectors, which makes the user vector and vector, the inner product between the items is close to the observed values implicit feedback matrix (in the case of user like this item, this value is close to 1, otherwise close to zero).

    A two-dimensional model is used for the results of the data set, in this case, the first to be found is characterized by item is a shelf, a second characteristic is: if a "geometry" style.

    Because in the matrix of zero is not necessarily said are not interested in things, we don't want to force model is suitable for them, because the user actually might be interested in some of the items are.As a result, we find the decomposition, to minimize the weighted error function, among them, the data matrix of the nonzero entries is higher than zero item weight.This comes from a previous article [The articleThis method], this article put forward.How to set up the weight depends on the degree of sparse matrix, and can through some form of [Cross validationTo find out.

    When we optimize the weight loss function reconstruction matrix (the product of two factors), in the input matrix and the matrix contains 0 will contain some positive elements, because we don't force model is suitable for these and not zero.These are all user might be interested in but haven't seen the goods.Reason is that this is the case in order to make the model better applicable, those who have been interested in the item set at the intersection of users will have similar vector, the goods also is such.So, not read, but by other users with similar interests items like, after reconstruction matrix, there will be a higher value.

    Alternating least squares method

    In order to optimize the model, we in item matrix alternation between the user and calculation, and in each stage to minimize the weighted average error, maintain the stability of another matrix (so named alternating least squares).At each stage, because there are feasible analysis scheme, we minimize the weighted square error can be calculated accurately.This means that each iteration are guaranteed not to increase the total error, but is reduced to two matrix form a local minimum error function.As a result, the whole process will gradually diminish error until you reach the local minimum.Minimum quantity may be different, so it may be a reasonable idea, repeat the above steps, and choose the best one, even though we don't do that.

    R of the method in [the Demo codeHere,]

    The calculation process is very natural to implement in the graphs.Because, for example, when updating a user vector, all that is needed is: he interactive item vector, and a small square matrix, by using matrix and its transposed matrix multiplication.Calculate each user in this way, and can use the limited memory, and each user can be updated in parallel.Update items similar methods.Some users like to a lot of things, or some goods by many users like, these calculations require more memory.In this case, we are sampling the input matrix, or filter out these items, or only for each user recently like baby.

    When we are satisfied with the model, we can continue to update it (through repeated every night ALS several steps), as more and more information, including users, commodities, collect...Into the online system.New items and users can be easily integrated into the model, as long as they are with the model in the existing enough interaction between users and items.

    In this way the production of graphs code [Here,]

    SVD (singular value decomposition)

    Alternating least squares method described above, gave us a simple graphs through the user preference matrix factorization method.However, its disadvantage is need multiple iterations, sometimes take a long time to get a good result.An attractive alternative is random SVD.This is a newer method, similar to the famous SVD in a large matrix, using a non iterative MapReduece implementation.We will implement it as a function, it can be any an extensible graphs Job calls.

    In linear algebra is a basic conclusion is: through the study of the singular values of several latitude truncation after the formation of the matrix, are all the same rank matrix (from the Angle of (yi - (mxi + b))) the best approximation matrix.However, we have noticed that when using this method, we can't use in ALS used in the optimization method: for error values assigned to the same "weight" method.But for some of the data set, if the number of its zero without overwhelming than non zero, then this method is feasible.For example, we can use it for the user to successfully build a collection of behavior model, and can't use it to purchase behavior to build a useful model, because buy more sparse, the weight is necessary.

    One advantage of this approach is that it produces the matrix have good regularization of the structure, thus can more easily online structural vector for new users, because there is no matrix inversion is a must.We also use this method to produce other items except user preferences listing vector, the treasures, for example, or other users in Etsy to create the list.In this way, we can list from other, recommend other related items.

    Product recommendations

    Once we have users and items model, we can use it to build product

    recommendations.This is a step easily ignored most of the research literature.We cannot expect to calculate users and items, for example, the product of matrix, and then find the best

    undiscovered items for each user, because it requires time and proportional to the item number and the product of the users, and both are in the hundreds of millions of level.

    A research paper is recommended to use a tree data structure, to allow the above space is not the end of search, by reducing the concentration of the whole item inner product is too small.However, we observed that this method did not work well in practice, may be due to the dimension and we use the model type of curse (hundreds of dimensions).

    Therefore we recommend using approximate method to calculate the results.The first candidate set to produce the goods, and then rank them according to the inner product, and take the highest.There are several ways to generate candidate set, for example, from the user like to collect the list of stores, or collection of the text is similar to users.But the main method we use is local sensitivity hashing (LSH), which we put the users and items of vector space into several hash box, to take those are mapped to the same box items set items for customization as per set.

    Local sensitive hash

    Partial sensitive hash is a kind of used for the large data set to find the approximate minimum neighbor technology.This technology has several variants, but we focus on one of these, is designed to deal with real data and the approximate nearest neighbor based on Euclidean distance.

    The idea of the method is dividing the space into a set of hash bucket, to make them close to each other's point in the space is likely to fall into the same bucket.We are doing this by building in the space plane some Numbers from "p" to make them all through the origin.The divided space into 2 ^ P convex cone, each of which constitute a hash bucket.

    In fact, we can be on behalf of the normal vector of plane to do this.A point fall in the side of the plane, and then determined by the symbol of points and normal vector inner product (if the plane is random, there is no doubt that we will get the non-zero inner product, but in principle, these points can be arbitrarily allocated to one side or the other).These normal vector we only need to random uniform in all directions in space.It is well known that the same-sex gaussian distribution map has such features.

    We label the hash bucket, if a point and the inner product of the ith first is positive, the hash code of the ith 1, otherwise 0, this means that each plane is responsible for a bit of hash code.

    At each point we will map to the respective corresponding hash bucket, we can calculate the approximate nearest neighbor or equivalent approximate recommended, by examining the vector in the same bucket.The average number of barrels will be 2 ^ {P} - the total number of points, therefore, to use more plane making process is very efficient.But it also reduces the accuracy of approximation, as it reduces the near point to any target point can be in the same bucket.

    Therefore, to achieve efficiency and the quality of the compromise, we need to be repeated many times hash process, and then combined output.Finally, in order to gain more control of the calculation process, we abandoned the larger hash cases, so as to realize efficient nearest neighbor is calculated.

    Implement the code in the [using a ConjectureHere,]

    Other ideas

    The above is the basic technology of personalized recommendation.In the process of developing these recommendation system, we found some place to improve, to help us improve the quality of recommendation.

    ; Standardized calculation before the vector: as mentioned earlier,

    matrix decomposition model tends to produce large-scale vector on

    popular items.As a result will probably recommend some popular

    items for many users, even if they are not necessarily the most taste

    for those users.Therefore, before the recommended calculation, we standardize all items in the vector.It also makes use of close to the nearest neighbor theory sounds feasible: because vector has unified benchmark, so users can vector the maximum amount of product by vector to get all of the recent items.

    ; Diversity: shop Etsy is a market made up of many sellers.To fair treatment of these sellers, we limit the show each user with the shop's recommendation results.And because the user always can enter the store with one click, browsing the store other items, it seems, will not affect the quality of recommendation.

    ; Items diversity: in order to make the recommendation results diversification, we to the user choose a set of 100 candidate nearest neighbor, and then filter out on the higher rank items have similar distance, the distance is the Euclidean distance between items vector by calculation.

    ; Relining similar users items: we found the user use LSH code between the nearest neighbor (so for each user, we can find that with the same interests and hobbies of users) and then to produce item recommendation results, we can use the list of user preferences, according to the vector and target users vector inner product between rearrange them.This should be able to get the looks better and more relevant recommendations, but still need a proper experiments to verify.

Report this document

For any questions or suggestions please email