Difference between revisions of "Writing Arrays To File"

From Robowiki
Jump to navigation Jump to search
m (Migrating page)
 
m (Fix a link to the old wiki)
 
(3 intermediate revisions by 2 users not shown)
Line 1: Line 1:
''Page on Old Wiki: [http://robowiki.net/cgi-bin/robowiki?WritingArraysToFile WritingArraysToFile]''
+
''Page on Old Wiki: [http://old.robowiki.net/cgi-bin/robowiki?WritingArraysToFile WritingArraysToFile]''
  
The simple type of [[GuessFactorTargeting]] in my [[TityusMega]] bot uses a multidimensional integer array to store the "most visited" count of each factor. Much like [[Fhqwhgads]], but unlike Fhqwhgads I save the data using a variant of [[Kawigi]]'s [[Compressed Serialization]]. Like so:
+
The simple type of [[GuessFactorTargeting]] in my [[TityusMega]] bot uses a multidimensional integer array to store the "most visited" count of each factor. Much like [[Fhqwhgads]], but unlike Fhqwhgads I save the data using a variant of [[User:Kawigi|Kawigi]]'s [[Compressed Serialization]]. Like so:
<pre>
+
<syntaxhighlight>
 
     private static int[][][][] aimFactors;
 
     private static int[][][][] aimFactors;
 
.
 
.
Line 39: Line 39:
 
         }
 
         }
 
     }
 
     }
</pre>
+
</syntaxhighlight>
As i understand very little of Java's I/O model I just hacked on the serialization code until it compiled. It works. But if someone with more understanding sees a danger here, please feel free to improve the code. -- [[PEZ]]
+
As i understand very little of Java's I/O model I just hacked on the serialization code until it compiled. It works. But if someone with more understanding sees a danger here, please feel free to improve the code. -- [[User:PEZ|PEZ]]
  
Looks to me like it should work.  This is similar to what [[FloodMini]] does, actually (except I use the GZIP streams instead of the Zip streams).  [[FloodHT]] does it a little more efficiently by making a series of nested loops and using out.writeInt() and out.readInt(). -- [[Kawigi]]
+
Looks to me like it should work.  This is similar to what [[FloodMini]] does, actually (except I use the GZIP streams instead of the Zip streams).  [[FloodHT]] does it a little more efficiently by making a series of nested loops and using out.writeInt() and out.readInt(). -- [[User:Kawigi|Kawigi]]
  
I considered this too, but firstly I don't need it yet (my files are 1.7k per opponent at the moment, after 10k rounds) and secondly I would have to rewrite the save/restore functions each time I changed my mind about the segmentation. And I change my mind often. =) -- [[PEZ]]
+
I considered this too, but firstly I don't need it yet (my files are 1.7k per opponent at the moment, after 10k rounds) and secondly I would have to rewrite the save/restore functions each time I changed my mind about the segmentation. And I change my mind often. =) -- [[User:PEZ|PEZ]]
  
I am wondering how about the compressibility between GZIP and ZIP,are those same?or one is better?  -- [[iiley]]
+
I am wondering how about the compressibility between GZIP and ZIP,are those same?or one is better?  -- [[User:Iiley|iiley]]
  
I will try this, but my bet is that if there is a difference it will be small. To solve your storage size problems you probably need to look elsewhere. Maybe you can e-mail me details on how you store things internally and externally today and I can help you ponder on a solution? -- [[PEZ]]
+
I will try this, but my bet is that if there is a difference it will be small. To solve your storage size problems you probably need to look elsewhere. Maybe you can e-mail me details on how you store things internally and externally today and I can help you ponder on a solution? -- [[User:PEZ|PEZ]]
  
My guess is that Zip compression does better if you set it to.  With the zip libraries, you can specify compression methods to be optimized for speed or smallest compression as well. -- [[Kawigi]]
+
My guess is that Zip compression does better if you set it to.  With the zip libraries, you can specify compression methods to be optimized for speed or smallest compression as well. -- [[User:Kawigi|Kawigi]]
  
Add [[Visit Count Stats#Limit_File_Size|VisitCountStats/LimitFileSize]] to this and you have really small files on disk. -- [[PEZ]]
+
Add [[Visit Count Stats#Limit_File_Size|VisitCountStats/LimitFileSize]] to this and you have really small files on disk. -- [[User:PEZ|PEZ]]
  
What do I have to import to get this to work?  It seems java.io isn't enough, and I can't see anything else in the API... -- [[Tango]]
+
What do I have to import to get this to work?  It seems java.io isn't enough, and I can't see anything else in the API... -- [[User:Tango|Tango]]
  
Scratch that - found it! -- [[Tango]]
+
Scratch that - found it! -- [[User:Tango|Tango]]
  
Also consider looking at [[Tityus]] save/restore functions. They are more straightforward and [[CodeSize]] friendly. Using gzip files like [[SandboxMini]]. -- [[PEZ]]
+
Also consider looking at [[Tityus]] save/restore functions. They are more straightforward and [[CodeSize]] friendly. Using gzip files like [[SandboxMini]]. -- [[User:PEZ|PEZ]]
  
I guess you could save even more space if you don't create a file per opponent but a file that contains a hashtable with the data of all opponents. --[[deathcon]]
+
I guess you could save even more space if you don't create a file per opponent but a file that contains a hashtable with the data of all opponents. --[[User:Deathcon|deathcon]]
  
That's what DT does, I think (at least, it only has one file in it's data directory).  I expect it is smaller, but also much slower, as you have to load the whole file each time. -- [[Tango]]
+
That's what DT does, I think (at least, it only has one file in it's data directory).  I expect it is smaller, but also much slower, as you have to load the whole file each time. -- [[User:Tango|Tango]]
  
 
It's incredibly slow actually. In fact if you run a DT with a full data quota file in [[RoboLeague]] you'll see that RL writes out a warning message in the beginning of each battle. Something like "[[SandboxDT]] hasn't started after 600ms. Giving up."
 
It's incredibly slow actually. In fact if you run a DT with a full data quota file in [[RoboLeague]] you'll see that RL writes out a warning message in the beginning of each battle. Something like "[[SandboxDT]] hasn't started after 600ms. Giving up."
  
Anyway, no need to write a hash table if you use regular zip files at least. They can contain several file entries. Probably quite easily glued on the above code. -- [[PEZ]]
+
Anyway, no need to write a hash table if you use regular zip files at least. They can contain several file entries. Probably quite easily glued on the above code. -- [[User:PEZ|PEZ]]
  
But you don't have to read our hash table each round. Reading it the first round and saving every single round works good and is not too slow.Using regular zip files my stats get sometimes crashed. --[[deathcon]]  
+
But you don't have to read our hash table each round. Reading it the first round and saving every single round works good and is not too slow.Using regular zip files my stats get sometimes crashed. --[[User:Deathcon|deathcon]]  
:: I was assuming you were only loading in the first round.  If you load all the data you have on every bot every round, you are going to have an extremely [[SlowBot]]. -- [[Tango]]
+
:: I was assuming you were only loading in the first round.  If you load all the data you have on every bot every round, you are going to have an extremely [[SlowBot]]. -- [[User:Tango|Tango]]
  
 
I think DT is slow on startup because it is creating new statistics structures on all the opponents in is file.  I can probably restrict most of this, save time and reduce the memory hit.  --  [[Paul]]
 
I think DT is slow on startup because it is creating new statistics structures on all the opponents in is file.  I can probably restrict most of this, save time and reduce the memory hit.  --  [[Paul]]
  
Why, it could be good to have if an enemy should just spontaneously appear on the battle field in mid battle. =) -- [[PEZ]]
+
Why, it could be good to have if an enemy should just spontaneously appear on the battle field in mid battle. =) -- [[User:PEZ|PEZ]]
  
Now *there's* an idea for a mod.  Quite easy to do, too.  Just put the enemy outside the field until a random time, when it appears.  You have 1 challenger, and a team of magically appearing bots. -- [[Tango]]
+
Now *there's* an idea for a mod.  Quite easy to do, too.  Just put the enemy outside the field until a random time, when it appears.  You have 1 challenger, and a team of magically appearing bots. -- [[User:Tango|Tango]]
  
 
Hmmm... there seems to be no getDataFile() so it won't compile.  Do I need to import it?  Or do I need to write it? --[[Bayen]]
 
Hmmm... there seems to be no getDataFile() so it won't compile.  Do I need to import it?  Or do I need to write it? --[[Bayen]]
  
It's defined in the [[RobocodeAPI]] for [[AdvancedRobot]], so you need a reference to your main tank class, and call MyTank.getDataFile(). -- [[Voidious]]
+
It's defined in the [[RobocodeAPI]] for [[AdvancedRobot]], so you need a reference to your main tank class, and call MyTank.getDataFile(). -- [[User:Voidious|Voidious]]
  
Hmm,in the guessfactors, if you keep saving/restoring the factors without 'lowering' them, won't the counts eventually get like so high...is there a range for integers? And if you were to proportionally 'lower' them, how would you do it with so many segments? --[[Starrynte]]
+
Hmm,in the guessfactors, if you keep saving/restoring the factors without 'lowering' them, won't the counts eventually get like so high...is there a range for integers? And if you were to proportionally 'lower' them, how would you do it with so many segments? --[[User:Starrynte|Starrynte]]
  
Theoretically, yes. But if I remember right, the maximum an integer can hold is in the range of 2000000000, so I don't think we'll reach it very quickly. =) If it did, that bin would shift to the minimum possible number, screwing up your results BIGTIME. So a very valid worry. Good thing we aren't using arrays of shorts to store our data. -- [[Skilgannon]]
+
Theoretically, yes. But if I remember right, the maximum an integer can hold is in the range of 2000000000, so I don't think we'll reach it very quickly. =) If it did, that bin would shift to the minimum possible number, screwing up your results BIGTIME. So a very valid worry. Good thing we aren't using arrays of shorts to store our data. -- [[User:Skilgannon|Skilgannon]]
  
If you use [[RollingAverage]] for your stats, as many of us do, you always have a value between 0 and 1. As for having lots of segments, it doesn't really matter if each segment is in the same scale, just that all [[GuessFactors]] in a given segment are in the same scale, so you could surely just divide all the GFs in a segment when they got too big, if you needed to. -- [[Voidious]]
+
If you use [[RollingAverage]] for your stats, as many of us do, you always have a value between 0 and 1. As for having lots of segments, it doesn't really matter if each segment is in the same scale, just that all [[GuessFactors]] in a given segment are in the same scale, so you could surely just divide all the GFs in a segment when they got too big, if you needed to. -- [[User:Voidious|Voidious]]
* Ok, using [[RollingAverage]] now...*discovers that there's a limit to amount of data in data file* --[[Starrynte]]
+
* Ok, using [[RollingAverage]] now...*discovers that there's a limit to amount of data in data file* --[[User:Starrynte|Starrynte]]
* =) ... I'd also look at [[WikiTargeting]] / [[SuperNodes]] if you're going to be saving gun data. The basic idea is to just save the GuessFactor data for the most visited nodes. For instance, you might have 20,000 segments in your gun, but find that 95% of the time the enemy spends in just 300 of those segments. So saving the best GF for those 300 segments gives you a LOT of info about the enemy in very little space. [[Dookious]] and [[Phoenix]] use data saving in their guns based on this idea. -- [[Voidious]]
+
* =) ... I'd also look at [[WikiTargeting]] / [[SuperNodes]] if you're going to be saving gun data. The basic idea is to just save the GuessFactor data for the most visited nodes. For instance, you might have 20,000 segments in your gun, but find that 95% of the time the enemy spends in just 300 of those segments. So saving the best GF for those 300 segments gives you a LOT of info about the enemy in very little space. [[Dookious]] and [[Phoenix]] use data saving in their guns based on this idea. -- [[User:Voidious|Voidious]]
  
Quick question, does the code above delete the old saved data before saving the new data? And if it doesn't, how? --[[Starrynte]]
+
Quick question, does the code above delete the old saved data before saving the new data? And if it doesn't, how? --[[User:Starrynte|Starrynte]]
  
I am ''pretty'' sure the above code would create a new file, overwriting any previous file for that bot (with the same name). But I'm not entirely sure on that. There is a <code>delete()</code> method in the File class, though, so you could do this to be sure: <code>getDataFile(enemyName + ".zip").delete()</code>. Note that this method will use a LOT of data for each bot, so you will not be able to save nearly as many opponents as with [[WikiTargeting]] -- [[Voidious]]
+
I am ''pretty'' sure the above code would create a new file, overwriting any previous file for that bot (with the same name). But I'm not entirely sure on that. There is a <code>delete()</code> method in the File class, though, so you could do this to be sure: <code>getDataFile(enemyName + ".zip").delete()</code>. Note that this method will use a LOT of data for each bot, so you will not be able to save nearly as many opponents as with [[WikiTargeting]] -- [[User:Voidious|Voidious]]
  
Actually I have been giving the save problem some thought for such things, and if you make an thread to do the work (like pear I think does), you could save it in something like a peak (save the peaks in the data) file. Then rebuild that data into the buffers as a type of lossy saving, but also have a degradation clause, giving the average amount of drop between peeks. Otherwise just save the peeks and valleys and just do a straight degradation between them. I know [[Dookious]] saves the most visited bins, but i'm not sure how. --[[Chase-san]]
+
Actually I have been giving the save problem some thought for such things, and if you make an thread to do the work (like pear I think does), you could save it in something like a peak (save the peaks in the data) file. Then rebuild that data into the buffers as a type of lossy saving, but also have a degradation clause, giving the average amount of drop between peeks. Otherwise just save the peeks and valleys and just do a straight degradation between them. I know [[Dookious]] saves the most visited bins, but i'm not sure how. --[[User:Chase-san|Chase-san]]
  
* Dooki just saves which GF bin was the most visited for each [[SuperNode]]. This means (I just checked) I can save a segment in 3 bytes for very small data files. When it restores, it uses a [[BinSmoothing]]  across the other bins and gives all this data a weight of 5 (as if it saw this GF for 5 firing waves). I like your idea, but I think just saving most visited is the best use of your space. In movement, saving multiple peaks seems much more worth it, though. Just my opinion, of course - I think [[David Alves]] is also more keen on saving all the segment's data instead of just the top visited GF. -- [[Voidious]]
+
* Dooki just saves which GF bin was the most visited for each [[SuperNode]]. This means (I just checked) I can save a segment in 3 bytes for very small data files. When it restores, it uses a [[BinSmoothing]]  across the other bins and gives all this data a weight of 5 (as if it saw this GF for 5 firing waves). I like your idea, but I think just saving most visited is the best use of your space. In movement, saving multiple peaks seems much more worth it, though. Just my opinion, of course - I think [[User:David Alves|David Alves]] is also more keen on saving all the segment's data instead of just the top visited GF. -- [[User:Voidious|Voidious]]
  
 
{{Saving Navbox}}
 
{{Saving Navbox}}
 
[[Category:Data Saving]]
 
[[Category:Data Saving]]
 
[[Category:Discussions]]
 
[[Category:Discussions]]

Latest revision as of 02:35, 13 February 2024

Page on Old Wiki: WritingArraysToFile

The simple type of GuessFactorTargeting in my TityusMega bot uses a multidimensional integer array to store the "most visited" count of each factor. Much like Fhqwhgads, but unlike Fhqwhgads I save the data using a variant of Kawigi's Compressed Serialization. Like so:

    private static int[][][][] aimFactors;
.
.
.
    void restoreFactors() {
        try {
            ZipInputStream zipin = new ZipInputStream(new
                FileInputStream(getDataFile(enemyName + ".zip")));
            zipin.getNextEntry();
            ObjectInputStream in = new ObjectInputStream(zipin);
            aimFactors = (int[][][][])in.readObject();
            in.close();
        }
        catch (IOException e) {
            System.out.println("Ah! A new aquaintance. I'll be watching you " + enemyName + ".");
            aimFactors = new int[ACCEL_SEGMENTS][DISTANCE_SEGMENTS][POWER_SEGMENTS][AIM_FACTORS];
        }
        catch (Exception e) {
            e.printStackTrace();
        }
    }

    void saveFactors() {
        try {
            ZipOutputStream zipout = new ZipOutputStream(new RobocodeFileOutputStream(getDataFile(enemyName + ".zip")));
            zipout.putNextEntry(new ZipEntry(enemyName));
            ObjectOutputStream out = new ObjectOutputStream(zipout);
            out.writeObject(aimFactors);
            out.flush();
            zipout.closeEntry();
            out.close();
        }
        catch (IOException e) {
            System.out.println("Error saving factors:" + e);
        }
    }

As i understand very little of Java's I/O model I just hacked on the serialization code until it compiled. It works. But if someone with more understanding sees a danger here, please feel free to improve the code. -- PEZ

Looks to me like it should work. This is similar to what FloodMini does, actually (except I use the GZIP streams instead of the Zip streams). FloodHT does it a little more efficiently by making a series of nested loops and using out.writeInt() and out.readInt(). -- Kawigi

I considered this too, but firstly I don't need it yet (my files are 1.7k per opponent at the moment, after 10k rounds) and secondly I would have to rewrite the save/restore functions each time I changed my mind about the segmentation. And I change my mind often. =) -- PEZ

I am wondering how about the compressibility between GZIP and ZIP,are those same?or one is better? -- iiley

I will try this, but my bet is that if there is a difference it will be small. To solve your storage size problems you probably need to look elsewhere. Maybe you can e-mail me details on how you store things internally and externally today and I can help you ponder on a solution? -- PEZ

My guess is that Zip compression does better if you set it to. With the zip libraries, you can specify compression methods to be optimized for speed or smallest compression as well. -- Kawigi

Add VisitCountStats/LimitFileSize to this and you have really small files on disk. -- PEZ

What do I have to import to get this to work? It seems java.io isn't enough, and I can't see anything else in the API... -- Tango

Scratch that - found it! -- Tango

Also consider looking at Tityus save/restore functions. They are more straightforward and CodeSize friendly. Using gzip files like SandboxMini. -- PEZ

I guess you could save even more space if you don't create a file per opponent but a file that contains a hashtable with the data of all opponents. --deathcon

That's what DT does, I think (at least, it only has one file in it's data directory). I expect it is smaller, but also much slower, as you have to load the whole file each time. -- Tango

It's incredibly slow actually. In fact if you run a DT with a full data quota file in RoboLeague you'll see that RL writes out a warning message in the beginning of each battle. Something like "SandboxDT hasn't started after 600ms. Giving up."

Anyway, no need to write a hash table if you use regular zip files at least. They can contain several file entries. Probably quite easily glued on the above code. -- PEZ

But you don't have to read our hash table each round. Reading it the first round and saving every single round works good and is not too slow.Using regular zip files my stats get sometimes crashed. --deathcon

I was assuming you were only loading in the first round. If you load all the data you have on every bot every round, you are going to have an extremely SlowBot. -- Tango

I think DT is slow on startup because it is creating new statistics structures on all the opponents in is file. I can probably restrict most of this, save time and reduce the memory hit. -- Paul

Why, it could be good to have if an enemy should just spontaneously appear on the battle field in mid battle. =) -- PEZ

Now *there's* an idea for a mod. Quite easy to do, too. Just put the enemy outside the field until a random time, when it appears. You have 1 challenger, and a team of magically appearing bots. -- Tango

Hmmm... there seems to be no getDataFile() so it won't compile. Do I need to import it? Or do I need to write it? --Bayen

It's defined in the RobocodeAPI for AdvancedRobot, so you need a reference to your main tank class, and call MyTank.getDataFile(). -- Voidious

Hmm,in the guessfactors, if you keep saving/restoring the factors without 'lowering' them, won't the counts eventually get like so high...is there a range for integers? And if you were to proportionally 'lower' them, how would you do it with so many segments? --Starrynte

Theoretically, yes. But if I remember right, the maximum an integer can hold is in the range of 2000000000, so I don't think we'll reach it very quickly. =) If it did, that bin would shift to the minimum possible number, screwing up your results BIGTIME. So a very valid worry. Good thing we aren't using arrays of shorts to store our data. -- Skilgannon

If you use RollingAverage for your stats, as many of us do, you always have a value between 0 and 1. As for having lots of segments, it doesn't really matter if each segment is in the same scale, just that all GuessFactors in a given segment are in the same scale, so you could surely just divide all the GFs in a segment when they got too big, if you needed to. -- Voidious

  • Ok, using RollingAverage now...*discovers that there's a limit to amount of data in data file* --Starrynte
  • =) ... I'd also look at WikiTargeting / SuperNodes if you're going to be saving gun data. The basic idea is to just save the GuessFactor data for the most visited nodes. For instance, you might have 20,000 segments in your gun, but find that 95% of the time the enemy spends in just 300 of those segments. So saving the best GF for those 300 segments gives you a LOT of info about the enemy in very little space. Dookious and Phoenix use data saving in their guns based on this idea. -- Voidious

Quick question, does the code above delete the old saved data before saving the new data? And if it doesn't, how? --Starrynte

I am pretty sure the above code would create a new file, overwriting any previous file for that bot (with the same name). But I'm not entirely sure on that. There is a delete() method in the File class, though, so you could do this to be sure: getDataFile(enemyName + ".zip").delete(). Note that this method will use a LOT of data for each bot, so you will not be able to save nearly as many opponents as with WikiTargeting -- Voidious

Actually I have been giving the save problem some thought for such things, and if you make an thread to do the work (like pear I think does), you could save it in something like a peak (save the peaks in the data) file. Then rebuild that data into the buffers as a type of lossy saving, but also have a degradation clause, giving the average amount of drop between peeks. Otherwise just save the peeks and valleys and just do a straight degradation between them. I know Dookious saves the most visited bins, but i'm not sure how. --Chase-san

  • Dooki just saves which GF bin was the most visited for each SuperNode. This means (I just checked) I can save a segment in 3 bytes for very small data files. When it restores, it uses a BinSmoothing across the other bins and gives all this data a weight of 5 (as if it saw this GF for 5 firing waves). I like your idea, but I think just saving most visited is the best use of your space. In movement, saving multiple peaks seems much more worth it, though. Just my opinion, of course - I think David Alves is also more keen on saving all the segment's data instead of just the top visited GF. -- Voidious