Downloading data to democratize geolocation?
Imagine if the WiGLE database was available for download and offline use. Imagine the unlimited possibilities for open data mashups and maps!
Fellow drivers have been donating their hard-earned wardriving data to WiGLE for over a decade. Wouldn't they want as many people as possible to share the benefit of their work? This could be an opportunity to democratize geolocation, breaking the grip of proprietary "Big Brother" services like Google and Skyhook.
60M Wifi networks is a big number. But assume a simple representation of a Wifi network could be 56 bytes: a 6 byte BSSID, a 4 byte Latitude float, and a 4 byte Longitude float. Clever delta-encoding of sorted BSSIDs or latlongs could probably reduce the average record size even more.
A database dump of 60M 56-byte records would only be a little more than 3 GB. This could probably be compressed to less than 2 GB. To minimize bandwidth costs, the compressed database dump could be seeded as a torrent file, updated monthly.
Keep drivin'!
thanks,
chris
Fellow drivers have been donating their hard-earned wardriving data to WiGLE for over a decade. Wouldn't they want as many people as possible to share the benefit of their work? This could be an opportunity to democratize geolocation, breaking the grip of proprietary "Big Brother" services like Google and Skyhook.
60M Wifi networks is a big number. But assume a simple representation of a Wifi network could be 56 bytes: a 6 byte BSSID, a 4 byte Latitude float, and a 4 byte Longitude float. Clever delta-encoding of sorted BSSIDs or latlongs could probably reduce the average record size even more.
A database dump of 60M 56-byte records would only be a little more than 3 GB. This could probably be compressed to less than 2 GB. To minimize bandwidth costs, the compressed database dump could be seeded as a torrent file, updated monthly.
Keep drivin'!
thanks,
chris
me, i like to imagine a world of ponies, being ridden around by kittens.
on and off (more off than on, frankly) we've been looking into ways to decentralize wigle,
establish a peering system of self-hosting datasets, etc. there are exciting problems
in that space when you also want to provide global-scale aggregation, which is one of many
reasons we don't already have such a beast.
on and off (more off than on, frankly) we've been looking into ways to decentralize wigle,
establish a peering system of self-hosting datasets, etc. there are exciting problems
in that space when you also want to provide global-scale aggregation, which is one of many
reasons we don't already have such a beast.
In the forums, someone said the WiGLE web API is undocumented and limited to 11,000 results to prevent automated scrapers from killing the server. To address those resource concerns, it seems pretty straightforward to offload the data available from the web API to a database dump on bittorrent or rsync.
For comparison, the hostip.info and MaxMind GeoIP sites publish their IP geolocation databases using rsync and tarballs. They post new database updates about once a month.
For comparison, the hostip.info and MaxMind GeoIP sites publish their IP geolocation databases using rsync and tarballs. They post new database updates about once a month.
I would like to second coldspell's request. More to the point, I am interested to know whether the reason there has not been a data release to date is actually technical or rather commercial. I would not be surprised if you could satisfy 95% of people after large scale data with a quarterly bittorrent release of the data, similar to (say) Stack Overflow. Based on everything WiGLE has implemented to date, surely creating a public data dump would be comparatively trivial and is not a deep technical problem? Which is why I am lead to wonder whether WiGLE's motivation for not releasing the data is commercial... If you could clarify whether this is the case, that would be useful. If it is being held captive for commercial purposes, I think WiGLE's users would appreciate it being openly acknowledged. If that's not the case, then I think a lot of people would appreciate a simple means of accessing an aggregate data dump, without the limitations inherent to a throttled query api.
i'd say "you must be new here", but you are, obviously.
everything is easy until you look into what's actually involved.
your specific interests do not, infact, generalize well; if you'd like to work the kinks out of a distributed, available, CAP-aware geospace map/reduce system: i will be your first live user.
while neophyte handwaving is entertaining, it is not conducive to solutions.
everything is easy until you look into what's actually involved.
your specific interests do not, infact, generalize well; if you'd like to work the kinks out of a distributed, available, CAP-aware geospace map/reduce system: i will be your first live user.
while neophyte handwaving is entertaining, it is not conducive to solutions.
Help me understand - how is the data actually stored on the backend? Is that itself a distributed, available, CAP-aware geospace map/reduce system, or are you speaking in reference to what you envisage as the data access API being? Do you envisage a constantly current source of data, and is that where the complexity is coming from (as opposed to a quarterly bittorrent upload), or is the complexity in the export process itself? How much data are we talking about? What sort of database is it stored in? I am genuinely interested in knowing more about this, and would be willing to at least look at how a data export might be done if you're willing to share more info about how things currently are on the backend. Is contributing points the limit of the WiGLE project, or are you also open to developers interested in contributing on the backend? You correctly infer that handwaving is the best I can bring to the table at the present time, though with some more information, that may change.if you'd like to work the kinks out of a distributed, available, CAP-aware geospace map/reduce system: i will be your first live user.
Hi,
I am actually interested in this data for my research, and I am willing to collaborate in its distribution.
I think, as a first release, we should keep it simple and manageable. Perhaps selecting the data for the last three years, divided by countries (roughly selected by coordinates), or continents, depending on the size.
How to implement it? Firstly, I think the optimal format is HDF5 [1]. It is fast, small, flexible, and compatible with many languages. As Bryce suggested, we could launch a torrent, so Wigle server will not suffer. To be more concrete, the process would be:
1) Extract data and build HDF5 files inside the server (I volunteer for creating the scripts).
2) Distribute the torrent to a short list of volunteers (myself included) that will host it.
3) Delete it from the server and publish the torrent. Everybody can start downloading from this people.
In order to make this painless, we could also do it in steps, like US first (for obvious reasons) and Spain (if I am to help, I want that data ), then other parts of the world.
Once the data is out, we would be one step closer to maybe someone starting a good interface to it.
[1] http://www.hdfgroup.org/HDF5/
I am actually interested in this data for my research, and I am willing to collaborate in its distribution.
I think, as a first release, we should keep it simple and manageable. Perhaps selecting the data for the last three years, divided by countries (roughly selected by coordinates), or continents, depending on the size.
How to implement it? Firstly, I think the optimal format is HDF5 [1]. It is fast, small, flexible, and compatible with many languages. As Bryce suggested, we could launch a torrent, so Wigle server will not suffer. To be more concrete, the process would be:
1) Extract data and build HDF5 files inside the server (I volunteer for creating the scripts).
2) Distribute the torrent to a short list of volunteers (myself included) that will host it.
3) Delete it from the server and publish the torrent. Everybody can start downloading from this people.
In order to make this painless, we could also do it in steps, like US first (for obvious reasons) and Spain (if I am to help, I want that data ), then other parts of the world.
Once the data is out, we would be one step closer to maybe someone starting a good interface to it.
[1] http://www.hdfgroup.org/HDF5/
url=http://wigle.net][/url]
I apologise for my lack of clarity: the file format is NOT the issue.
I third this request! To top it off you've brushed off two people that were offering to open up your data (wanting nothing for their time but access to said data). What you are doing is antithetical to the notion of "collaberative" working. Users do the leg work whilst you reap the benefit of the data (carefully avoiding answering any questions regarding the commercial use of the data).
you'll find over the years, I address all topics with the same level of seriousness.
you'll also find no greater supporters, advocates, and general allaround wardriving boosters.
if you think running this site (and being large-scale net observers ourselves) isn't leg work,
you have not been paying attention.
you'll also find no greater supporters, advocates, and general allaround wardriving boosters.
if you think running this site (and being large-scale net observers ourselves) isn't leg work,
you have not been paying attention.
I have been paying attention. You make valid points, HOWEVER, you still haven't answered the question about accessing the data.
not getting the answer you want is not the same as not having the question answered :-)
Reasons aside, the key WiGLE maintainers may not be looking to work on large-scale opening up of the data at this time. For the researchers/others here mostly interested in access to large-scale raw data, perhaps the best approach would be to fork the WiGLE Android client (https://github.com/wiglenet/wigle-wifi-wardriving) and work on replacing the network code to transmit observations off to another server. Said new client would then need to be released onto Google Play under a different name. The server could be setup from the get-go to focus not on the presentation/inference aspect of the collected data, but rather purely as a repository of all future observations that can either be accessed through a real-time API or via periodic torrent seeds. There might not even be a web interface to view the data. The server is purely there for the purpose of creating a public record of AP observations and it can be left up to third-parties to devise interfaces to the data that use the API for access.
The downside to this of course is that there's no way for the API to access historical data already collected with WiGLE, plus the fragmentation that results from having two similar apps/time required to build up a sufficient user base etcetera. Under the current circumstances though it may be the most pragmatic approach to guaranteeing an open data set in future. There's probably quite a bit of value in having a fully open worldwide dataset of AP details, a la Open Street Maps. It may even be the kind of project that someone can get funding for to get up and running plus ongoing server hosting support etc.
The downside to this of course is that there's no way for the API to access historical data already collected with WiGLE, plus the fragmentation that results from having two similar apps/time required to build up a sufficient user base etcetera. Under the current circumstances though it may be the most pragmatic approach to guaranteeing an open data set in future. There's probably quite a bit of value in having a fully open worldwide dataset of AP details, a la Open Street Maps. It may even be the kind of project that someone can get funding for to get up and running plus ongoing server hosting support etc.
there have been many other projects come and go over the years.
we're always happy to see them come, and always sad when they fade.
we're always happy to see them come, and always sad when they fade.
I came to the forums to post a request for database dumps when I saw that the original poster had made exactly the points I was going to make (a database dump would not have to be large + torrent distribution). So I'll second his request.
I understand that a fully decentralized WiGLE would be a lot of work to implement, but that is not what we are asking for here. We are simply asking for a dump of the observations. I routinely work with larger data sets than this, and I do not see why this would be a burden to implement.
I agree with MegGeo that you brushed off the other posters. Your replies amount to "We can't make downloadable database dumps because a fully decentralized peer-to-peer WiGLE with bells and whistles is too much work". That is like saying "I can't build a bridge because lanuching a moon rocket is too difficult" - it's a non sequitor.
I understand that a fully decentralized WiGLE would be a lot of work to implement, but that is not what we are asking for here. We are simply asking for a dump of the observations. I routinely work with larger data sets than this, and I do not see why this would be a burden to implement.
I agree with MegGeo that you brushed off the other posters. Your replies amount to "We can't make downloadable database dumps because a fully decentralized peer-to-peer WiGLE with bells and whistles is too much work". That is like saying "I can't build a bridge because lanuching a moon rocket is too difficult" - it's a non sequitor.
Return to “WiGLE Project Suggestions”
Who is online
Users browsing this forum: Bing [Bot] and 2 guests