I have a database of items and sizes. I wan't to group them into groups of X objects with a total size of <=Y but as close to Y as possibleHere are some example values (there are actually 2k of them)109114542585867166625214677117367048I want Put them into groups of 60 items with a group total value of 6000.My first though is grab the largest and then the smallest, and keep doing that to the center median value. but ideally i would actually have them kinda averaged out.There is probably some smart/packing algorithm or something that I could use here, but I'm not that savvy anymore..Any thoughts
2/17/2015 7:30:28 PM
So your givens are the number of items in a group and the group total value (and the large set of numbers to pull from)?
2/17/2015 7:50:08 PM
if you can assume the set of values isn't ordered, as your set is listed, then the most (compute) efficient way to do that would be to grab the next element from the list and add it to the group until the next value would take you over. a couple nested loops should do it. if you wanted to add packing efficiency as well, just have it try list.next(), list.next().next(), and list.next().next().next(), so you bound any run-away searches. should be pretty close to O(n)
2/17/2015 8:29:13 PM
It probably depends on how much you care about getting close to an optimal solution. A solution like the one outlined in ^ would probably be fine if you want a quick solution. I think you would probably do a little better in terms of average distance from your upper bound if you first sorted your item size list. Create a priority queue of size 6000/60 = 100 and start adding the biggest items first to this priority queue. The PQ should be sorted on the basis of total item size for each element in the queue. Add the next biggest item from the list to the top of the PQ (i.e. the element with the smallest total size) and repeat. If an element hits a size of 6000 exactly, remove it from the PQ.If you posted this to http://stackoverflow.com, I bet you would get some answers that would be nearly optimal within an hour.
2/17/2015 8:52:49 PM
Visualize your numbers as a distribution. You have 33 buckets to pluck from.To keep things even, you have to drain the buckets by pulling from pairs in the nth bucket and n+33/2 bucket .You'll have to calculate a distribution for your values though to use this (which shouldn't be hard really).
2/17/2015 9:19:10 PM
does need to be optimal for allocation however, compute/caluation optimization isn't really important.I don't want to really brute force it... Also my code has to be in powershell.Right now this is my current planSince I know that the average Value of the group=100Create a group, get the first Item from the list add that to the groupfind a value that has not been assigned to a group such that (Group+New Value) is as close to 100 as possible without putting the sum(group)>6000Add that value to the group, continue till the group has 60 items in it or the sum of the group>6000then create a new group, and grab the first free value.Again, probably horrific from a compute standpoint, but I think it will get me near where i want to be.on a simplified set of where say we wanted 5 items per group with a sum of 500, using the data i posted earlierGroup 1 would be something like109,86,114,77,117Group 2 would be something like54,146,70,117, et cetra
2/17/2015 9:20:27 PM
This looks like a variation of the "k-partition problem" which is well-studied. I did a quick googling and found "The Easiest Hard Problem: Number Partitioning" by Stephan Mertens which presents some methods for partitioning a set into 2 subsets. You might try looking in math or computer science journals.
2/18/2015 12:48:36 PM
^ thanks for the tip.That makes sense,Looking at solutions for the "Balanced Partition Problem"which i can recursively call on the set breaking them into 2 sets of about equal size and count, then keep subdividing those sets.[Edited on February 18, 2015 at 3:11 PM. Reason : dd]
2/18/2015 3:04:30 PM
sounds like a data structures class i've seen this before but my mind isn't functioning right now.
2/18/2015 4:33:25 PM
so i think i got the follwoing powershell code working for the most part$myArray = 1,6,2,4,5,9,1$ArraySize=$myarray.count$Sum=($myArray|Measure-Object -Sum).sum$sol=new-object bool[] ([math]::Round($sum/2+1,[System.MidpointRounding]::AwayFromZero))$sol[0]=$true#for ([int]$i=0;$i -lt $ArraySize;$i++){foreach($i in $myArray){for ([int]$j=$sum/2;$j -ge $i;$j--){##-Host IValue:$i#Write-Host JValue:$jif ($sol[$j-$i]){$sol[$j]=$true} #End If}#End J Loop}#End I Loop[int]$HalfSumCloser=([math]::Round($sum/2,[System.MidpointRounding]::AwayFromZero))For(;($sol[$HalfSumCloser]) -ne $true{$HalfSumCloser--}(($sum - $halfsumcloser) - $halfsumcloser)but that tells me if i can do it, not how to actually do it.
2/18/2015 5:21:03 PM
trying to implement thishttp://people.cs.clemson.edu/~bcdean/dp_practice/dp_4.swfYea.... i haven't done this type of programming problem in 12 years... if ever.
2/18/2015 6:17:37 PM
$myarray=@()$myarray+=120for($i=0;$i -lt 10;$i++){$myarray+=Get-Random -Minimum 80 -Maximum 150}$best=($myarray|Measure-Object -Sum).Sum$target=$best/2write-host Starting work arraysum $best$bests=@()$p=@{}$p[-1]=@{0=@()}for ([int]$i=0;$i -lt $myArray.Length;$i++){$x=$myArray[$i]$p[$i]=@{}foreach ($j in $p[$i-1].Keys){$p[$i][$j]=$p[$i-1][$j]$p[$i][$j+$x]=$p[$i-1][$j]+@($x)#$pif([math]::Abs($j+$x-$target) -lt [math]::Abs($best-$target) -and ($p[$i][$j+$x]).count -eq [int]($myArray.count/2)){$best=$j+$x#write-host Best $best#write-host besti $i $j $x$bests=$p[$i][$j+$x]}}}Write-Host Solution error: ($best-$target)write-host Original Array ($myarray|Sort-Object)write-Host Group1 (($bests|Sort-Object) -join ', ')Write-Host group1 sum ($best|measure-object -Sum).Sum[System.Collections.ArrayList]$group2list=$myArrayforeach($value in $bests){$group2list.Remove($value)}Write-Hostwrite-Host Group2 (($group2list|Sort-Object) -join ', ')Write-Host group2 sum ($group2list|measure-object -Sum).Sum[Edited on February 18, 2015 at 11:59 PM. Reason : BUGS!]
2/18/2015 11:51:28 PM
one note on this,on a set of 1000 buckets this thing DRINKS memoryBasically re-wrote this python code in pshttps://bj0z.wordpress.com/2011/03/07/the-balanced-partition-problem/[Edited on February 19, 2015 at 12:22 AM. Reason : phython]
2/19/2015 12:20:43 AM
okay so screw those bastardshere is how i'm doing it, seems 1000 times faster, although it might not be 100% accurate$myarray=@()for($i=0;$i -lt 1000;$i++){$myarray+=Get-Random -Minimum 80 -Maximum 150}[System.Collections.ArrayList]$Set1=@()[System.Collections.ArrayList]$Set2=@()for([int]$i=0;$i -lt $myArray.Count/2;$i++){$set1.Add($myArray[$i])}for([int]$i=$myArray.Count/2;$i -lt $myArray.Count;$i++){$set2.Add($myArray[$i])}$s1sum=($set1|Measure-Object -Sum).Sum$s2sum=($set2|Measure-Object -Sum).Sum$setdifference=$s1sum-$s2sumfor($i=0;$i -lt $set1.Count;$i++){if($setdifference -eq 0){break}$setdifference$s1item=$set1[$i]for($J=0;$j -lt $set2.Count;$j++){$s2item=$set2[$j]if([math]::Abs((($s1sum-$s1item+$s2item)-($s2sum-$s2item+$s1item))) -lt [math]::abs($setdifference)){#write-host Change$set1.Remove($s1item)$set1.Add($s2item)$set2.remove($s2item)$set2.Add($s1item)$s1sum=($set1|Measure-Object -Sum).Sum$s2sum=($set2|Measure-Object -Sum).Sum$setdifference=$s1sum-$s2sum$i=-1break}}}write-host TotalSum ($myarray|Measure-Object -sum).sumwrite-host StartArray (($myarray|Sort-Object) -join ', ')write-host Set1Sum ($set1|Measure-Object -sum).sumwrite-host Set1 (($set1|Sort-Object) -join ', ')write-host Set2Sum ($set2|Measure-Object -sum).sumwrite-host Set2 (($set2|Sort-Object) -join ', ')
2/19/2015 1:47:43 AM
You said you want groups of 60 close to 6000, this implies the mean of all your values is more or less 100. Is this for certain? With a mean reasonably far away from 100 I think you'd end up with a lot of values that you couldn't get to add up to 6000 in groups of 60.
2/20/2015 6:09:21 PM
^just an example, right now values are between 80 and 300, with a mode around 120.Actually trying to load balance file servers to datacenter DFS servers
2/21/2015 12:38:29 AM