Knee and a Half: Ultimate - Hat Tournaments - Making better teams, faster

Previously:
In the first part of this post, I looked at how we rate ourselves as Ultimate players. I showed that the way players rate themselves correlates weakly with how an observer rates them, too weakly to be used to achieve accurate ratings.

The post prompted some debate on my Facebook page - most interesting was Eric's take. You should go read the whole thing, but in a nutshell, he suggests ditching a numeric system, and switching to an archetype based one. I'll wait while you read the whole thing...

It will almost certainly be easier for most players to rate themselves this way. Another idea, posted in a comment by my father, is to give examples for the numeric ratings. For example, for the experience rating:

I've been playing for a few months
I've been playing for over half a year, and play occasionally with friends.
I play on a club here in Israel, and practice at least once most weeks.
I played on the national team / top level club in Israel, or played for my college in the states.
I played club or top level college Ultimate in the states.

That would force people to rate themselves on our scale, not theirs. I picked the easiest example for it - but you can definitely write a similar scale for disc skills, athleticism, and speed as well. UI wise, it should be a radio button, rather than a text box, but that's a different story...

I love Eric's idea - it's very cool and intuitive. However, there is a big advantage to forced quantification of the data - it allows me to automate a big part of the team generation process.

What I did:

I started by going through the players' ratings, and distilling them into a position and a rating. The position is a handler, a cutter, or a girl (with teams having only two or three girls each, it's not worth splitting them further). The rating is a number between 1 and 9 - where, honestly, a 1 is subtraction by addition, and a 9 is the highest level found in Israel. [I'm not actually sure this is the best method - see note in the end]. We recently also started saving the ratings we end up giving people - so they can be tuned after tournaments, and to help us 'remember' players we've already met.

Using these ratings, I can now start making teams. I have two obvious goals when making teams - split the handlers, cutters, and girls roughly equally between the teams, and keep the total ratings around equal as well. A few other goals include keeping the average ages of teams close (so I don't throw a team of kids against team of oldies), ditto with average heights, and making sure each team as a "leader", or more generally, worrying about chemistry.

My team generation algorithm is pretty simple:

Go through each pool of players (handlers, cutters, girls):
Within each pool, select a random player from the highest remaining score (the randomness is to cause some variation in the generated sets of teams)
Assign this player to the team with the lowest total rating so far. If there is more than one option, again, choose randomly.

I generate a large number of such suggestions, and then give them each a score. My (admittedly very simplistic) scoring algorithm is based on variances:

Suggestion score = 2x [team total score variance] +

[variance of teams' average ages] +

[ variance of teams' average heights].

The lower the score, the better. I wanted to weigh the scores more, but also consider the heights and ages. Obviously, this is also inexact (just like people fill in their ratings inaccurately, they may do the same with their heights, and to a lesser extent, ages).

Okay, what now:

I take the suggestion of teams with the lowest score, and start working from there. This the awesome part - instead of spending an hour or two (and sometimes double that) assigning players to teams, I can (almost) immediately jump to fine tuning. This includes several steps:

Making the teams have equal (or approximately equal) numbers of players. (If you feel like it, see if you can figure out why it wouldn't always happen with the algorithm I use)
Verifying that the teams with fewer or weaker handlers also get cutters who feel more comfortable behind the disc, or girls who handle. [This might also be solved by the other rating algorithm I have in mind - again, see note in the end]. This is crucial - with our generally low level of disc skill (at least, compared to the US), a team with noticeably weaker handling is in a significant disadvantage.
Chemistry considerations. This is by far the hardest, most voodoo-like part of the process. I think I got it mostly right on the last ligat cova, but there are so many unknowns that you never really know. I try to make sure each team as at least one experienced player, who doesn't mind leading and being vocal about it, and that each team as a mixture of younger and older talented players.

Tweaking with those considerations usually takes around half an hour, which means the whole team generation process took around an hour. Last, but certainly not least, I make sure another set of eyes (or two) review the teams, and make sure I didn't do anything overly silly (such as forget a player, think a girl is a guy, or absolutely mis-rate a player. Purely hypothetical examples, of course...).

I took a process which used to take three-four hours of total pain in the ass, and turned it to a relatively smooth ordeal which takes an hour or less to complete. I can live with that :)

I wrote the team generation script in Python, and tend to tweak it before each tournament. Just for kicks, I think I might write quick GUI to make the fine tuning easier (and because I feel like learning to work with PyQt).

I'd love to hear your thoughts on this - what might I be doing wrong? How can we do it better? What would you change?

-Guy

Notes:

Another rating system: I messed around with keeping our ratings on another scale. Instead of keeping a single 1-10 rating, keep two. Rate each player as both a handler and a cutter. That would better demonstrate the cutters who have the disc skills and decision making to handle if need be, as well as differentiate better between the handlers who are old and slow (*cough* Mickey *cough*), and those who can also cut if it serves the team better.

If we do that, then we no longer need to split the guys into handlers and cutters. It would require to modify the team generation algorithm some - I haven't done it yet, but it certainly doesn't seem like a very tough problem. Of course, manual tweaking would still be necessary - but maybe less so than before. What do you think? Might it work better?

Knee and a Half

Thursday, December 8, 2011

Ultimate - Hat Tournaments - Making better teams, faster

No comments:

Post a Comment