Writing Arrays To File

From Robowiki
Revision as of 08:18, 24 May 2009 by Robobot (talk | contribs) (Robobot 0.1 : correcting user page links)
Jump to navigation Jump to search

Page on Old Wiki: WritingArraysToFile

The simple type of GuessFactorTargeting in my TityusMega bot uses a multidimensional integer array to store the "most visited" count of each factor. Much like Fhqwhgads, but unlike Fhqwhgads I save the data using a variant of Kawigi's Compressed Serialization. Like so:

    private static int[][][][] aimFactors;
.
.
.
    void restoreFactors() {
        try {
            ZipInputStream zipin = new ZipInputStream(new
                FileInputStream(getDataFile(enemyName + ".zip")));
            zipin.getNextEntry();
            ObjectInputStream in = new ObjectInputStream(zipin);
            aimFactors = (int[][][][])in.readObject();
            in.close();
        }
        catch (IOException e) {
            System.out.println("Ah! A new aquaintance. I'll be watching you " + enemyName + ".");
            aimFactors = new int[ACCEL_SEGMENTS][DISTANCE_SEGMENTS][POWER_SEGMENTS][AIM_FACTORS];
        }
        catch (Exception e) {
            e.printStackTrace();
        }
    }

    void saveFactors() {
        try {
            ZipOutputStream zipout = new ZipOutputStream(new RobocodeFileOutputStream(getDataFile(enemyName + ".zip")));
            zipout.putNextEntry(new ZipEntry(enemyName));
            ObjectOutputStream out = new ObjectOutputStream(zipout);
            out.writeObject(aimFactors);
            out.flush();
            zipout.closeEntry();
            out.close();
        }
        catch (IOException e) {
            System.out.println("Error saving factors:" + e);
        }
    }

As i understand very little of Java's I/O model I just hacked on the serialization code until it compiled. It works. But if someone with more understanding sees a danger here, please feel free to improve the code. -- PEZ

Looks to me like it should work. This is similar to what FloodMini does, actually (except I use the GZIP streams instead of the Zip streams). FloodHT does it a little more efficiently by making a series of nested loops and using out.writeInt() and out.readInt(). -- Kawigi

I considered this too, but firstly I don't need it yet (my files are 1.7k per opponent at the moment, after 10k rounds) and secondly I would have to rewrite the save/restore functions each time I changed my mind about the segmentation. And I change my mind often. =) -- PEZ

I am wondering how about the compressibility between GZIP and ZIP,are those same?or one is better? -- iiley

I will try this, but my bet is that if there is a difference it will be small. To solve your storage size problems you probably need to look elsewhere. Maybe you can e-mail me details on how you store things internally and externally today and I can help you ponder on a solution? -- PEZ

My guess is that Zip compression does better if you set it to. With the zip libraries, you can specify compression methods to be optimized for speed or smallest compression as well. -- Kawigi

Add VisitCountStats/LimitFileSize to this and you have really small files on disk. -- PEZ

What do I have to import to get this to work? It seems java.io isn't enough, and I can't see anything else in the API... -- Tango

Scratch that - found it! -- Tango

Also consider looking at Tityus save/restore functions. They are more straightforward and CodeSize friendly. Using gzip files like SandboxMini. -- PEZ

I guess you could save even more space if you don't create a file per opponent but a file that contains a hashtable with the data of all opponents. --deathcon

That's what DT does, I think (at least, it only has one file in it's data directory). I expect it is smaller, but also much slower, as you have to load the whole file each time. -- Tango

It's incredibly slow actually. In fact if you run a DT with a full data quota file in RoboLeague you'll see that RL writes out a warning message in the beginning of each battle. Something like "SandboxDT hasn't started after 600ms. Giving up."

Anyway, no need to write a hash table if you use regular zip files at least. They can contain several file entries. Probably quite easily glued on the above code. -- PEZ

But you don't have to read our hash table each round. Reading it the first round and saving every single round works good and is not too slow.Using regular zip files my stats get sometimes crashed. --deathcon

I was assuming you were only loading in the first round. If you load all the data you have on every bot every round, you are going to have an extremely SlowBot. -- Tango

I think DT is slow on startup because it is creating new statistics structures on all the opponents in is file. I can probably restrict most of this, save time and reduce the memory hit. -- Paul

Why, it could be good to have if an enemy should just spontaneously appear on the battle field in mid battle. =) -- PEZ

Now *there's* an idea for a mod. Quite easy to do, too. Just put the enemy outside the field until a random time, when it appears. You have 1 challenger, and a team of magically appearing bots. -- Tango

Hmmm... there seems to be no getDataFile() so it won't compile. Do I need to import it? Or do I need to write it? --Bayen

It's defined in the RobocodeAPI for AdvancedRobot, so you need a reference to your main tank class, and call MyTank.getDataFile(). -- Voidious

Hmm,in the guessfactors, if you keep saving/restoring the factors without 'lowering' them, won't the counts eventually get like so high...is there a range for integers? And if you were to proportionally 'lower' them, how would you do it with so many segments? --Starrynte

Theoretically, yes. But if I remember right, the maximum an integer can hold is in the range of 2000000000, so I don't think we'll reach it very quickly. =) If it did, that bin would shift to the minimum possible number, screwing up your results BIGTIME. So a very valid worry. Good thing we aren't using arrays of shorts to store our data. -- Skilgannon

If you use RollingAverage for your stats, as many of us do, you always have a value between 0 and 1. As for having lots of segments, it doesn't really matter if each segment is in the same scale, just that all GuessFactors in a given segment are in the same scale, so you could surely just divide all the GFs in a segment when they got too big, if you needed to. -- Voidious

  • Ok, using RollingAverage now...*discovers that there's a limit to amount of data in data file* --Starrynte
  • =) ... I'd also look at WikiTargeting / SuperNodes if you're going to be saving gun data. The basic idea is to just save the GuessFactor data for the most visited nodes. For instance, you might have 20,000 segments in your gun, but find that 95% of the time the enemy spends in just 300 of those segments. So saving the best GF for those 300 segments gives you a LOT of info about the enemy in very little space. Dookious and Phoenix use data saving in their guns based on this idea. -- Voidious

Quick question, does the code above delete the old saved data before saving the new data? And if it doesn't, how? --Starrynte

I am pretty sure the above code would create a new file, overwriting any previous file for that bot (with the same name). But I'm not entirely sure on that. There is a delete() method in the File class, though, so you could do this to be sure: getDataFile(enemyName + ".zip").delete(). Note that this method will use a LOT of data for each bot, so you will not be able to save nearly as many opponents as with WikiTargeting -- Voidious

Actually I have been giving the save problem some thought for such things, and if you make an thread to do the work (like pear I think does), you could save it in something like a peak (save the peaks in the data) file. Then rebuild that data into the buffers as a type of lossy saving, but also have a degradation clause, giving the average amount of drop between peeks. Otherwise just save the peeks and valleys and just do a straight degradation between them. I know Dookious saves the most visited bins, but i'm not sure how. --Chase-san

  • Dooki just saves which GF bin was the most visited for each SuperNode. This means (I just checked) I can save a segment in 3 bytes for very small data files. When it restores, it uses a BinSmoothing across the other bins and gives all this data a weight of 5 (as if it saw this GF for 5 firing waves). I like your idea, but I think just saving most visited is the best use of your space. In movement, saving multiple peaks seems much more worth it, though. Just my opinion, of course - I think David Alves is also more keen on saving all the segment's data instead of just the top visited GF. -- Voidious