Dinosaur Riding Scrum Ninja Jedi Masters

Agile is so last year...

Designing the Scoring Engine

ON

In a previous post, I discussed the differences between filtering on constraints and scoring on preferences. As of the last timebox, filtering on game length was implemented (that is, using the interface, a user could return games of only certain lengths). For this timebox, I completed the filtering step on the only other constraint that needs to be filtered on, the number of players.

That is the easy part. Scoring games can get very tricky. At the highest level, it’s very intuitive; you simply compare the user preferences to the game attributes and sum up scores for each match. The most pertinent question is, what will the numerical score values be based on? Will they be normalized? To answer the question of normalization, there is no need to normalize the scores for each match. The only place the scores will be used is in the ordering of the games presented to the user. In other words, the scores only need be relative.

What defines a match? If we only considered matches as the user’s preference being exactly the same as the game’s attribute, and all else won’t earn you any score, then our job as programmers would be fairly easy. However, the fact of that matter is that many games may have partial matches; they match to some degree. For example, if a user indicates that they prefer a light game, we may want to give some “full” score to all light games, but maybe 75% of the points if the game is medium light, 45% if medium, and 10% if medium heavy. Unforunately, this implies that the scoring functions need to be written for each attribute individually, as the degree to which an attribute matches a preference is dependant upon the attribute.

The final question is, where do we get the scores? This brings us back to the original goal of this project: to emulate an expert in the field of board game recommendations. We don’t need to come up with the scores ourselves; we need to get experts to tell use what the scores on the attributes ought to be. In the future, this will be important information to extract from our experts.

The last thing to consider is that we will be adding explanations on scores for display on the results page. This means that as each match is made, a corresponding string must be written or generated that describes what was matched and how much it affected the score. We can simply maintain a running list of explanations alongside the score to pass back from the game engine.

Similarity Engine Design and Machine Learning Tools

ON

My reponsibilities for Timebox #3 have primarily revolved around the machine learning extension to our project. After the expert interviews and discussion with potential users that took place during this timebox, it was decided that a search that provides recommended games based on some individual game that a user had enjoyed playing would be very useful and should be surfaced as a main feaure of the application, giving greater weight to this portion of the project. Elaborating on my previous post on the topic, I will be discussing some of the design and implementation decisions we have made.

Creating a Training Set

One of the challenges of developing a machine learning system, in our case, is having a reliable and sufficiently large training set to use to train the machine implementations we decide to use. This responsibility will fall on our experts and other experienced members of CABS. The number of games in our current input space is currently at 1000, and may be expanded in the future. Since his experts have to be able to rate games in terms of each other, we decided that an interface that randomly selects a set of ~16 games, and then asks which of those games the user is familiar with, would result in much more efficient training. These selections would then be passed to the game comparison screen, described previously, in which the users selects on a sliding scale how much they would recommend the second game based on someone enjoying the first.

Some of the issues that arise from this system include the potential for multiple experts to rate the same pair of games. It is because of this, and also the fact that we dont want experts to be presented the same game pair more than once, that it is necessary to keep track of which training patterns belong to which expert. When it comes time to actually train the system, any duplicates will be averaged together to create the final training set.

Input Data

Another step in implementing the system is to produce a set of input data that can be fed through the network. Currently, a record for a single game in the database looks like the following:

{
  "_id" : ObjectId("507ddc8538ef1f102b000001"),
  "bgg_id" : 12002,
  "name" : "Jambo",
  "thumbnail" : "http://cf.geekdo-images.com/images/pic1244694_t.jpg",
  "image" : "http://cf.geekdo-images.com/images/pic1244694.jpg",
  "min_players" : 2,
  "max_players" : 2,
  "min_age" : 12,
  "length" : 45,
  "year_published" : 2004,
  "rating_average" : 7.13898,
  "rating_bayes_average" : 7.01279,
  "rating_std_dev" : 1.19381,
  "rating_median" : 0,
  "weight_average" : 2.0779,
  "weight_num" : 693,
  "rank" : 233,
  "tags" : [
    {
      "bgg_id" : 1002,
      "subtype" : "category",
      "value" : "Card Game"
    }

    ...

  },
  "suggested_age" : {
    "2" : 0,
    "3" : 0,
    "4" : 0,
    "5" : 0,
    "6" : 1,
    "8" : 2,
    "10" : 14,
    "12" : 8,
    "14" : 2,
    "16" : 0,
    "18" : 0,
    "21 and up" : 0
  }
}

However, in order to appropriate input for a machine learning implementation, the input must be a vector of numerical values, so I have begun to write a map function to perform this conversion, which can be seen here. The output for the game then looks like the following:

[0.0075 0.25 0.25 0.6666666666666666 0.233 0.3465]

One interesting issue with our set of data is the possibility of having a very large input vector. Ryan and I have discussed this issue and have been researching techniques such as principal component analysis, among others, as a possible solution to reduce the number of dimensions of our input. The primary concern is training performance, however, I have not yet had the chance to do any sort of dummy training to get an idea of what the limitations are going to be.

Machine Learning

Initially we had discussed using Weka as our machine learning library, however, after looking into the documentation, it seemed that the neural nets and other features were primarily geared towards being used as classifiers. The domain for our problem falls more into function approximation, as we want to be able to return and train with values in a continuous range, rather than a discrete one.

After looking at my options, I decided to go with Encog. Encog has a repuation for being mature and rather optimized in terms of performance. It also has a well-documented Clojure wrapper, Enclog. Encog offers a variety of network types as well as training algorithms that I will need to experiment with to determine what approach is most appropriate for our input data.

Returning the Results

In order to quickly return the values generated by the network, we went ahead and set up a Redis instance that is hooked to our Heroku account. We will be using the sorted sets data structure to hold the results. The sorted set in redis takes input in the following format:

key score member

In our situation, the data we will be inputting will look like the following:

GameID-A similiarty-value GameID-B

This way, we will be able to very efficiently query what games, given some game A, have a similarity-value in a certain range. The zrevrange function will most likely be the most appropriate for making these queries. Once the Game ID’s have been retrieved, the normal game database can then be queried to provide full details on each game.

Timebox 3

ON

For detailed information on accomplishments from this timebox, see the links below:

Interviews

  1. Jeff Horger
  2. Jeff Kayati
  3. Corey Sanders

Worklog

Chris Powers

  • Front-end mockups
  • Tri-State button functionality

Alex Burkhart

  • Load mongodb from BGG data
  • Get simple query functionality working
  • Make prototype query results page.

Ryan McGowan

Ryan primarily managed code being included into our master branch. He did most of the merging of features and refactored several components.

  • Create initial project scaffolding.
    • Bootstrap, Leiningen, Mongo and the wiring between them to have a running app. This includes the config functionality.
  • Refactor Javascript for query interface.
  • Corrections to markup for query interface to properly use bootstrap scaffolding.
  • Feature branch merges
  • Refactor expert.js and the expert select interface to use bootstrap scaffolding.
  • Several other style changes.
  • Interview writeups and notes

David Albert

Commit History

For more information on who has done what see our commit history on our github project.

Attribute Search Interface Implementation

ON

The most fundamental aspects of the query interface have been implemented. Though there may be more attributes added in the future, we decided to add a select few attributes that we believed encapsulated differing types of input we receive from the user. In particular, we expect the user to select discrete values for some attributes, such as number of players and time to play (in discrete intervals), and tri-state objects that can encapsulate the user liking, disliking, or not caring about a particular attribute. Prototypes were built to represent these input types, and are currently present on the interface:

Fundamental interface

These input types maintain hidden values in the background in order to formally send them to the server.

Attribute Search Interface Design

ON

The most significant interface to our system is by far the query-construction interface. This is the first page that comes up when a user lands on the engine’s webpage, and it encapsulates the core functionality of the system: to take in user specified attributes and provide back a list of matching results.

The interface is a bit tricky in a few significant ways. Firstly, there are potentially tens of attributes of games that we could surface as options. Fortunately, we have our experts to narrow down these attributes to the relevant ones. Secondly, each attribute has a different domain of values that are permissible. There’s no way around this one; we’ll have to build input selectors that match for each attribute. Thirdly, even with the reduced list of attributes, the user will still likely only care about a subset of them. We’d prefer if the user only had to interact with those attributes that he considers relevant to what he’s looking for.

This last issue brings us to the design of the interface. After some thought, I proposed to the team two layouts for constructing queries:

Attribute selection

The user selects attributes to the left, then chooses the particular desired or undesired values of those attributes.

Accordian selection

The user selects the attributes in the middle, then the area for manipulating value preferences of those attributes appears under the attribute (sliding out like an accordian).

The two styles are designed around the notion that users shouldn’t have to interact with attributes that they don’t care about. In both interfaces, the user only selects attributes that they have a preference on. It’s important to note that we intent to take into account negative preferences as well, a decision made based on feedback from our experts.

After proposing the two designs to the team, it was decided that we would go with the first design, because we believed that it looked better and utilized the space better.

The Challenges of Implementing a “Similarity” System

ON

In the context our project, we would like to extend our project to not only provide games directly based on a user’s preferences, but also further recommend games that are similar to the ones provided by our rule-based system. However, a challenge arises when we try to define the term “similar.” How do we know that a user will like a game that is similar in attributes to another? What makes certain types of games desirable if a user liked one particular game? To alleviate this problem, we decided that utilizing some sort of clustering, categorization, or machine learning technique would be beneficial.

To start with the most basic of techniques, first we need to numerically represent our knowledge base. This would need to be in the format of a vector:

< a1, a2, a3, a4 … an >

Where an is the numerical representation of some attribute of the game. For example, a1 may be the minimum number of players and a3 may be the number of minutes a game typically takes to play. One metric of game similarity would be the euclidean distance between the two vectors.

Minimizing distance would yield the two games that are most numerically similar to each other. To query such a system quickly, a data structure such as an octree or k-d tree could be used. It is also worthwhile to note that since each an would cover a different domain, it would be necessary to normalize each domain to a range of values such as (0.0, 1.0). Otherwise, attributes represented with large numeric domains would have a greater impact on the euclidean distance. This issues leads us into our next line of thought, weighting different attributes by their importance to the user.

One way of introducing importance into this equation is to add an additional constant to each attribute of the game, in this case cn, where cn is some value in the range (0.0, 1.0).

< c1*a1, c2*a2, c3*a3, c4*a4 … cn*an >

A cn value of 0 represents the least importance, as it makes the nth attribute have a value of 0 for all vectors. Conversely, a value of 1 represents the greatest importance to the user, as it allows that attribute to have the largest impact on the resulting distance.

An alternative method of achieving an “importance” when comparing attribute values is with gaussian curves. In this case, with the curve centered at μ = an, some value would be chosen (for example .75) and all values where curve is greater than .75 for each an would be accepted as “similar.” To introduce the idea of importance, σ can be changed for each n. A low σ results in a very narrow curve, and thusly a very narrow range of “similar” values in that axis, representing high importance, whereas a large σ results in a very wide range of values, representing a low importance.

In order for this method to work as intended, the curve would have to be scaled by a factor of σ such that the peak was always at a value of 1.0.

This method could also potentially be used for determining the “similarity” space for a set of multiple vectors (such as a user’s previously enjoyed games), but more on that later.

While these methods may be very effective for finding games that are very similar in terms of their attributes, however, much of what users desire involves variety in games. If the only recommendations made are games that are nearly identical, will the user benefit? Also, are there intangible factors that make games recommendable in relation to other games that are hard to represent numerically or via a set of rules? One solution to this issue is to use a machine learning system.

With a machine learning system, the experts themselves can train the system to identify games that would be appropriate for recommendation. The hope being that with enough input, the system itself will produce results similar to that of an expert. This may be accomplished through, for example, a neural network. First, the expert would be given an interface that would randomly, or at the choosing of the expert, provide two games; let’s call them Game A and Game B. The expert would choose, via some sort of slider UI element a value, with “Would never recommend” on one end of the spectrum and “Would highly recommend” on the other. This would be in reference to if the user enjoyed Game A considerably, the expert would “highly recommend” Game B to that user. The input pattern to the neural network would be the union of the two games’ attributes, such as the following:

< a1, a2, a3, a4 … an, b1, b2, b3, b4 … bn >

The network would then be trained with the output being a single floating point value, with 0 corresponding to “never recommend” and 1.0 corresponding to “highly recommend.” Once a sufficient amount of training data had been provided, the system should hypothetically be able to return a confidence factor, provided with two games, of whether if the user enjoyed the first, they would also enjoy the second.

The problem with this method lies in the performance of the system. If a certain number of recommendations are to be made, it would be preferable that those recommendations have the highest output from the neural network. This means that in order to arbitrarily find the best match, the entire dataset has to be fed through the network. If a large dataset exists, this will be very inefficient. A possible solution to this problem is to have a process running in the background, updating a sorted set of values for each vector with the output of the network using some sort of a caching system such as Memcached or Redis for storage. This way, the best results can be pulled quickly from the cache.

It would be necessary to also keep a database of the experts’ training data. This way, the network can be trained over as many epochs as necessary. This would also allow for different networks to be tested and trained with various architectural choices such as varying numbers of layers and learning rates. The training data could also be used for some other type of machine learning device, such as a Support Vector Machine.

Problem Domain & Motivations

ON

Problem Domain

Board games have been around for thousands of years and are enjoyed by countless people. The different types of games are innumerable - from something as simple as checkers to as complex as an advanced role playing game. These games possess many qualities and attributes, including, but not limited to, number of players, complexity, age group, mechanics, and player skill level. With such a diverse set of criteria to describe games, as well as the large number of games available, one can see how it would be difficult to classify and recommend games to players that they will enjoy. Whether it is based on a set of attributes that the player desires, or the player’s history of games that they have enjoyed playing in the past, it would be very beneficial to board game enthusiasts to have access to a system that could provide such recommendations.

Knowledge Base

Expert knowledge in the problem domain of board game recommendation can come from a variety of sources. Human experts primarily include those who design board games and serious board game enthusiasts, however, in order to be comprehensive, the opinions and experiences of casual players should also be considered. The logical rules and decisions that these experts make in choosing which games to play will play a large part in a system designed to solve this particular problem.

Soruces of facts and data in regards to the problem domain can also come from non-human sources such as BoardGameGeek. This specific website contains a large, publicly accessible database of games, descriptions, and ratings. This data can be incorporated, as facts or otherwise, into a system to provide recommendations to its users.

Project Motivation

One of the team’s members, Alex Burkhart, is actively involved in the Columbus Area Boardgaming Society. He had mentioned that with such a large library (over 1000 games), it would be very beneficial if some sort of application existed to recommend board games to prospective players. While resources such as BoardGameGeek exist, there is not a user-friendly interface available to filter and evaluate the data available to generate relevant recommendations to its users. The team decided that this problem offered a challenging, flexible, and extensible project opportunity, and would also provide a complementary service that would be frequently used and appreciated.