Hashmaps vs storing data in the tree

Revision as of 26 July 2012 at 13:15.
The highlighted comment was edited in this revision. [diff]

Hashmaps vs storing data in the tree

I was working on my kd-tree and one of the changes I made was to store the data in the tree rather than in a hashmap, thinking that this would save time. However, in my benchmarks, it is much faster to use a hashmap. Does anyone know why this would be the case?

AW‎

It's very strange question because hashmap and kd tree is absolutly different structures with different aims and contract and for sure map faster because it's nature. May be you publish source code with usage of map and tree?

Jdev‎

O, looks like i misunderstand you. Do you mean why HashMap is faster than TreeMap? I'm not sure but i think, that hash map has efficiency O(1) but tree map O(log N) because tree map are sorted and based on red-black tree. I do not describe how they works because my english skill...

Jdev‎

What I mean is that it seems slower to store the data with the point in a KDTree than it does to store the point in the tree and then use the point as the key for the hashmap. So for example you would have:

PointEntry entry = new PointEntry(pointCoordinates, DataObject); tree.add(entry);

instead of

hashmap.add(pointCoordinates, DataObject); tree.add(pointCoordinates);

Maybe I have some typecasting in the tree that is slowing it down?

AW‎

Ok, i understand yours problem. But now i have not advices:) can you publish kdtree? Is difference only in type of data stored in kdtree? What type has pointCoordinates?

Jdev‎

IIRC, my experience on this matter is that it's faster to use HashMap than to store PointEntry objects that have both pointCoordinates and DataObject, but it's faster than either to in your leaf nodes store two arrays, one for the "pointCoordinates" values and one for the "dataObject" values.

I'm pretty sure the reason for this is that with both using a HashMap for data objects, and using two separate arrays for coordinates and data, you avoid some pointer dereferencing. Basically, it's bad for performance if when comparing the location of a point, you have to dereference a PointEntry object and the coordinates themselves, rather than just the coordinates.

One of the important things to remember when dealing with Java, is that unlike some other languages (i.e. C++) every object is another pointer you have to dereference, which makes an array of primitives faster than an array of objects, and also makes two arrays of objects faster than one array of objects that store two objects each.

Rednaxela‎

Hashmaps vs storing data in the tree

Hashmaps vs storing data in the tree

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools