TWW - Statistics question (or linear programming)

go to bottom

Message Boards » » Statistics question (or linear programming)

Page [1]

1985
All American
2175 Posts
user info
edit post

This is probably a linear programming question, and probably easy, but ill ask anyway

I have a list of objects (call them OBJ) and two lists of prices for those objects, taken at different time periods. I want to partition the objects into two groups such that the difference of the average of the differences associated with each group is maximized

e.g MAX( avgGroup1(P2-P1)- avgGroup2(P2-P1))

So we're maximizing the difference between groups of the average price difference amongst the groups

make sense?

6/30/2009 1:50:21 PM

neolithic
All American
706 Posts
user info
edit post

Does this need to be guaranteed optimal and does the process have to be deterministic?

6/30/2009 2:10:50 PM

1985
All American
2175 Posts
user info
edit post

^ No, it doesn't have to be either.

It does have to be 'near optimal' though. I don't know what the word for it is, but suppose i give it a (reasonable) parameter, it can't be off of the optimal solution by more than that parameter. Ie. it can't have a probability of giving me the worst solutions.

I know this is going to be some simplex method bullshit that I don't want to do....

6/30/2009 2:27:42 PM

neolithic
All American
706 Posts
user info
edit post

^ That makes this a lot easier, nearly optimal is something relatively easy to do.

Are you familiar with Evolutionary Algorithms (EAs)? Basically you start with an initially (poor) solution and then evolve it until you have a good one. This problem sets itself up well for an EA I believe.

This is basic idea of what you would do:

1) - Randomly split the list into two groups. Measure the fitness of this solution, which in this case is:
avgGroup1(P2-P1)- avgGroup2(P2-P1)

2) - Create a new generation of solutions by randomly permuting your initial one. Swapping members from one group with ones from the other or taking (no exchange) a member from one group and putting it in the other. Calculate the relative fitness for one of these new offspring.

3) - Allow selected (there are a few different methods of selection you can use) members of the population to be permuted/reproduce to create another generation of solutions.

4) Repeat 3 for X generations or until you have solution convergence.

This may not be for you, but I believe it would work really well for your problem. You could probably do this in only a few lines of code too.

6/30/2009 2:42:52 PM

Shadowrunner
All American
18332 Posts
user info
edit post

I may be misunderstanding what you can choose here, but it sounds fairly simple.

Suppose you have N objects total, and that you assign k objects to group 1 and N-k objects to group 2. For any given k, the groups that maximize your objective function can be constructed by rank ordering your objects according to the difference in the prices at each time (let a_1 be the object with the largest price difference, all the way through a_N the object with the smallest difference). Then the solution must be either assigning a_1 through a_k to group 1, or assigning a_(N-k+1) through a_N to group 1 (I'm assuming here that group labeling doesn't matter, so you're actually interested in the maximum absolution value of the difference).

If you don't have a constraint on the size of the groups (i.e., partitions don't have to be of equal size), then you next need to choose the value of k that maximizes your objective function. Suppose you initially have optimal groups containing k and N-k objects. Adding another object to group one, resulting in k+1 objects, results in adding an object with a smaller difference in price than the group average with only k objects. Group 2 also loses an object with a larger difference in price than the group 2 average; therefore the difference between Group 1 and Group 2 becomes smaller than it was with only k objects. This holds in general, so the optimal choice of k is 1.

7/8/2009 1:29:23 AM

neolithic
All American
706 Posts
user info
edit post

^ Are you saying that it is always optimal to assign one member to a group by itself and assign all others to the other group?

7/8/2009 10:42:37 AM

1985
All American
2175 Posts
user info
edit post

^^ Yeah, thanks. Now that I'm thinking about it, it is pretty simple. if the groups don't have to be the same size (they don't) I can simply start with the n-1 largest elements and the smallest. find the difference of avgs and compare that to the n-2 largest and the 2 smallest. if it increases the difference, continue, if not, stop. the act of swapping changing the smallest element is monotone, so I wont ever get pidgin holed into a wrong answer. (I don't think)

^ That cant be true, consider 100,100,1,1

I guess this problem gets harder when you have arbitrarily many groups and want to maximize the average difference between each group pairwise... or maybe it doesnt.
[Edited on July 8, 2009 at 10:54 AM. Reason : .]

[Edited on July 8, 2009 at 10:58 AM. Reason : .]

7/8/2009 10:53:26 AM

Shadowrunner
All American
18332 Posts
user info
edit post

^^Sorry, I was implicitly assuming that the price differences were all distinct. If that's not the case, then you would rank order them as before, and if you are including an object in your group, you should also include any other object with the same price difference. That solves problems caused by sets like {100,100,1,1} as 1985 proposed.

At first glance, I think this would only matter if the objects with identical price differences are the maximal or minimal price differences for the set of all objects. But yes, I think the solution should be that one group contains all the objects that have the maximal [or minimal] price difference.

This is pure intuitive speculation, but generalizing to more than two groups might work as an iterative solution. Maximize two groups as outlined above; choose a third group as all objects from the second group that are on the opposite end of the ordering from the objects in the first group. In other words, if you choose the first group by taking the objects with the maximal price difference, the third group might be taking all objects with the minimal price difference, leaving the second group to be the rest in the middle.

This does depend somewhat on your choice of metric when generalizing; are you thinking about maximizing the average pairwise difference in average price differences? You could also minimax it, or use some other metrics.

Going beyond three groups, I would extend that to alternating picking off the highest and lowest price difference objects from what remains ungrouped, until you get to the specified number of groups. Then you just need to test whether your initial group should be the maximal or the minimal price difference objects. Again, just intuition, I haven't thought much about it.

This is probably less a linear programming problem, and more about partitions of ordered sets.

7/8/2009 3:53:43 PM

neolithic
All American
706 Posts
user info
edit post

Quote :
"
^^Sorry, I was implicitly assuming that the price differences were all distinct. If that's not the case, then you would rank order them as before, and if you are including an object in your group, you should also include any other object with the same price difference. That solves problems caused by sets like {100,100,1,1} as 1985 proposed.
"

What about the set {100,100,2,1} instead? Are you saying that group1 = {100,100,2} and group2 = {1} is optimal? Based on my understanding of the problem this would yield:

Average of group1: (100+100+2) / 3 = 67.3333333
Average of group2: = 1
Difference = 67.3333333 - 1 = 66.3333333

If instead you had selected group1 = {100,100} and group2 = {2,1}

Average of group1: (100+100) / 2 = 100
Average of group2: = (2+1) / 2 = 1.5
Difference = 98.5

I think that one of us is misunderstanding something, it could be me. I thought he was trying to maximize the difference between the average price differential between the two groups.

Also, I don't think an iterative approach like that would work, but this too is just intuitive speculation.

7/8/2009 4:45:10 PM

1985
All American
2175 Posts
user info
edit post

^ You got it right, that is what I mean and your counter example works.

well, almost, youd have to use 101,100,2,1 to statisfy his conditions

[Edited on July 8, 2009 at 5:56 PM. Reason : .]

7/8/2009 5:54:33 PM

neolithic
All American
706 Posts
user info
edit post

That's what I thought. Anyway, the most reasonable solution to me still seems to be to use something from the optimization toolbox (like an EA).

You can't use linear programming, because this it too dynamic (i.e. you can't come up with a set of deterministic equations beforehand to solve).

You can't use any the well known clustering or partitioning algorithms because you really don't care about clustering. If you wanted to group objects into two groups such the variance was minimized you could use these. This also rules out and sort of two dimensional projection and using splines.

You can't use any of the function minimization techniques like Gauss-Newton for the same reason you can't use LP.

I think that the heuristic that Shadowrunner suggested is probably the easiest to do, though I don't know if it will always yield the best answer, and I don't think it generalizes to higher dimensions, for the same reason that shortest path algorithms don't work by greedily taking the least expensive hop between two nodes. Usually there is a global minimum that can not be found by using pair-wise minimization/maximization.

All that being said, I think if you ever have to do this for more than two lists something like an EA is your best bet. They are pretty scalable and robust. If you want any information on these I can point you in the right direction.

Also, is this a purely academic example or are you actually using this for something?

7/8/2009 6:10:36 PM

Shadowrunner
All American
18332 Posts
user info
edit post

No, my algorithm for {100,100,2,1} would give the solution Group 1 = {100,100} and Group 2 = {2,1}. What I had originally said was

Quote :
"Then the solution must be either assigning a_1 through a_k to group 1, or assigning a_(N-k+1) through a_N to group 1"

meaning that you should check two cases for your singleton group: the objects with the maximal price difference or the objects with the minimal price difference.

After neolithic helpfully pointed out that you shouldn't assume prices are all distinct, I amended the approach to saying that the singleton set should contain all the objects with the maximal [or minimal] price difference, not just one of those objects.

So when faced with the set {100,100,2,1}, you would compare two cases. The maximal case would be {100,100} and {2,1}, and the minimal case would be {1} and {100,100,2}. Clearly, the maximal case has the higher average difference.

1985's counterexample of {101,100,2,1} is good though, and it shows that my stopping rule is incorrect. The optimal groups are {101,100} and {2,1} even though my suggestion that a singleton set should be best would result in comparing {101}, {100,2,1} to {1}, {101,100,2} and then stopping.

So I think 1985's stopping rule from this quote is the way to go:

Quote :
"I can simply start with the n-1 largest elements and the smallest. find the difference of avgs and compare that to the n-2 largest and the 2 smallest. if it increases the difference, continue, if not, stop."

At worst, you have n calculations of differences in group averages--2 comparisons for each group size from 1 to n/2. The worst case is where each price difference is distinct, and in that case, for each size of Group 1 from 1 element up to n/2 elements (beyond that is symmetric), you need to calculate the averages from the "maximal" groups and the "minimal" groups.

So brute forcing this without a stopping rule seems pretty doable unless your lists of objects are absolutely huge. The key observation is that for any size k of Group 1, rank ordering the price differences.allows you to only consider two permutations--the group of k largest price differences and the group of k smallest price differences.

[Edited on July 8, 2009 at 7:19 PM. Reason : ]

7/8/2009 7:10:22 PM

Shadowrunner
All American
18332 Posts
user info
edit post

I think that if you do check the two permutations for each possible size of groups, that should always result in the best answer because it results in the best answer for any given group size. Just wrap the algorithm in a loop for size 1 to size N/2, store the results in an array along with an indication of whether the best solution for a given size was the maximal or minimal group, pull out the array index of the maximal element of the array, and you have your winner.

I have a hunch that the best group size is related to the largest price difference between elements in the ordering. For example, in 1985's set of {101, 100, 2, 1}, the largest price difference is between 100 and 2, so that suggests the optimal partition is the one that splits at that point. Any quick counterexamples by inserting new numbers in between 100 and 2?

7/8/2009 7:17:14 PM

1985
All American
2175 Posts
user info
edit post

^ I had the same hunch, ill try to think up a quick proof/counter example

P.S. Thanks for the awesome thread guys. I appreciate the input. This is a problem I ran into at work, and while it doesn't really need to be solved for me to continue with work, I found it kind of fun.

[Edited on July 9, 2009 at 11:03 AM. Reason : .]

7/9/2009 11:02:31 AM

Message Boards » Study Hall » Statistics question (or linear programming)

Page [1]

go to top

Admin Options : move topic | lock topic