How to explain open source to your grandparents

I imagine I’m not alone in having parents and grandparents who don’t really understand what I do for a living. “I work in computing and do stuff with maps” is the easy approach (in fact it’s easier now that I don’t have to tag on the bit about being an archaeologist but not actually digging, and no it’s not like Time Team or Indiana Jones). Sometimes people ask why we don’t just “do everything with google maps”, which is the cue for a sit down and a longer chat about how (deep breath) you can’t do *everything* with google. So far, so good…

This all changed a couple of weeks ago, when, to my surprise I got elected to the Board of Directors of OSGeo. Cue shock, and great rejoicing (and in my head at least, tearful Oscars-style acceptance speeches). Mr Archaeogeek thinks this is cool enough to tell parents, grandparents, family friends etc. I do too, don’t get me wrong, but…

How to explain what it means to people who don’t use computers all that much, let alone know about gis, or open source or OSGeo?

There have been a few good articles on how to explain why you work in open source (the “how do you make money?” argument), but I feel like I need to go further back and explain about software licensing. I don’t think that most people really understand the difference between the way software is sold/licensed, and most other products, so I’m working on a car analogy that explains why open source software needs to exist. It goes something like:

“Can you imagine, that if you brought a car, and something went wrong with it, you couldn’t lift up the bonnet and take a look? Or if you did need to take it to a garage you had to use the one the car salesman told you to use? Or after 2 years they told you that they wouldn’t support your old car any longer, and you had to buy a new one? Or you couldn’t insure more than one person to drive it, but had to buy a new car for each person? Well that’s sort of what closed-source software is like”

(Note this is an analogy-in-progress)

Once that’s out of the way, then I can get on to gis, and hence to OSGeo, and all is fine!

Setting up a PostgreSQL standby server

Over the last couple of months I have been investigating options for setting up a standby server for PostgreSQL, you know, the sort of magical thing that stops your day/night/week being totally wrecked when, to quote Joel Spolsky  you “go crying to the system administrator and asking piercingly sad questions about why the backup system is “temporarily” out of commission and has been for the last eight months”. As a relative beginner to all of this, if you look in the PostgreSQL documentation, you will get totally overwhelmed and confused (well I did, anyway) as the information you need is spread across several chapters and is dense, even by the PostgreSQL documentation’s standards. So, when I found some documentation that was actually quite clear and concise, and got me through the process without too much angst, I thought I’d record my notes here. Big disclaimer- these are by no means comprehensive or complete. No flaming!

Useful links

Basic Idea
We are setting up a warm standby postgresql server with streaming replication. In the event of a failure of the primary server, the standby server will take over but will not be queryable until that point. Streaming replication reduces the window of data loss between a primary server failing and the standby server being in a position to take over.

Important Notes
Additional network and operating system specific software will also be required to handle the change in IP address and the trigger to move the standby server into read/write mode (see later). The two servers must be as identical as possible. In particular the same version of postgresql must be installed and the servers must have the same architecture (32 or 64bit).

Primary Server Preparation: postgresql.conf

Set up continuous archiving on the primary server by setting the following parameters in postgresql.conf (note that the archive_command line has been split over two lines for readability- in real life that would be one line):

 archive_command = on
 archive_command = 'cp %p
   /location/where/write-ahead-logfiles/should/go/%f'
 wal_level = archive
 max_wal_senders = 5
 wal_keep_segments = 32
 listen_addresses='*'

The location of the write-ahead-logfiles (WAL) should be accessible to both the primary and the standby server, and preferably remote from the primary server (for obvious reasons). The %p and %f symbols are specific to the archive_command and are substituted for the path and the file name respectively when the command is executed.

Primary Server Preparation: pg_hba.conf

Add a connection from the standby server to the main server as follows:

host replication [user] [addressrange] md5

The database name “replication” should not be changed- it’s a pseudo-database connection specifically to allow replication to take place and does not reflect an actual database.

Standby server configuration

Ensure that the location of the WAL files are accessible from the standby server and that a connection can be made to the database on the primary server.

Create a file called “recovery.conf” in the data directory for the standby server. This should have the following parameters as a minimum (the primary_conninfo command has been split over two lines here for readability):

 standby_mode='on'
 primary_conninfo = 'host=primaryhostIP user=databaseuser
   password=databasepassword port=port'
 trigger_file='/tmp/psql.trigger'
 restore_command = 'cp /path/to/WALfiles/%f %p'

Substitute the correct connection details in primary_conninfo, and the path to the remote WAL logs as appropriate. Also substitute operating-system specific file copy commands in the restore_command entry. The trigger file does not have to exist at this point. Its purpose is to tell the standby server when to move into read/write mode (see later). You may also wish to replicate the settings from the primary server’s pg_hba.conf to ensure that all connections will be allowed in the event that the standby server is used.

Take a base backup of the primary server

A base backup is not the same as a database dump. It is a backup of all the files in the postgresql data directory. The location of this can vary. In linux, you can find it by running the following command at a command prompt:

 ps auxw | grep postgres | grep -- -D

In any operating system, if you can connect to the database server (eg with pgadmin3) then enter the following SQL command:

 SHOW data_directory;

You can take a backup of the data directory when the server is running, or when it is stopped. It is simplest to take it when the database server is stopped (and it need not be stopped for very long).

Backing up data directory when postgresql is stopped

  • Stop the database in the appropriate way (eg /etc/init.d/postgresql stop on linux or stop the service in windows).
  • Backup the entirety of the data directory as found above, using whatever method you like, such as tar.
  • Copy the backup file to somewhere accessible to the standby server
  • Restart the database

Backing up the data directory without stopping postgresql

  • Connect to the database server as the postgres user and issue the following command (where ‘label’ is anything you want to label the backup in the log files with):
SELECT pg_start_backup('label',true);
  • Disconnect from the server and back up the contents of the data directory as above
  • Connect to the database server as the postgres user and issue the following command:
SELECT pg_stop_backup();
  • Copy the backup file to somewhere accessible to the standby server

Restoring the backup to the standby server

  • Stop the postgresql service on the standby server by the appropriate means.
  • Replace the contents of the database directory with that from the backup, being sure not to overwrite recovery.conf
  • Start the postgresql service.

If you watch the log files on the standby server you should see that it will reach a stage where it is successfully connected to the primary server. Errors about the cp command not being able to find archive files are usually non-fatal, as are entries about zero-length logfiles. However, if your log files show other errors, then it is likely that the base backup has not been correctly restored. Try redoing this with the postgresql service stopped to avoid issues.

Note

We have set up a warm standby server in this scenario. This means that you cannot connect to the standby database with psql or pgadmin3 to check that it is working! The log files will simply show a cycle of synchronising with the WAL files from the remote location, and then attempting replication via TCP with the primary server.

How to bring up the standby server

In the event of a failure in the primary server, something (e.g. a heartbeat process) needs to create the trigger_file in the location specified in the standby server recovery.conf file. This can be an empty file, so on linux a simple “touch” command will be enough. Once the standbyserver recognises this file, it will switch to read/write mode. This is clear from the log files, and database connections will be allowed.

The recovery.conf file will be automatically renamed recovery.done once successful recovery has taken place.

In the event that the primary server is reinstated, it will be necessary to synchronise the two databases- the base backup procedure run on the standby server should be sufficient.

OSGIS 2011 Round-up

Woefully out of date now, here’s a quick run down on the OSGIS 2011 conference, 3rd in that series, held at the University of Nottingham Centre for Geospatial Sciences in Nottingham over the 21st and 22nd of June.

The 21st was a day of workshops, under the banner of Interoperability and the OGC. My new colleague, Matt, and I did a workshop on using Ordnance Survey Open Data and Mastermap with Mapserver and PostgreSQL, using the OSGeo Live DVD. You can see a slightly edited version of the workshop below, or on slideshare. I have to admit that most of the kudos must go to Matt for creating some super scripts to make the initial data processing *much* easier, and to some of my other colleagues for their efforts in styling the data once it’s in Mapserver. The scripts and a small subset of the open data are available here- you’ll have to supply your own Mastermap!

Day Two was all about the talks- and I was impressed by the standard. The focus of OSGIS has always leant slightly towards the academic, so the stand-out talks for me were the ones that demonstrated that you can do real spatial analysis with open source GIS. There were also some very good papers on mapping in the developing world. Two of my ex-colleagues from Oxford Archaeology also did a joint paper showing how the use of open source software has progressed there- that was really good to see- it was nice to know that the baton had been passed on when I left! I gave people an introduction to the OSGeo:UK local chapter, which is also available on Slideshare here, and we had the chapter AGM. It’s extremely gratifying to see the numbers of people willing to hear about, and get involved with, the local chapter. I was going back through the reports I’d given in previous years, and the numbers of people signed up to our mailing list steadily creep up, year on year- we’re now well over the 100 people mark! (BTW, if you’re interested, our website is here).

All in all it was a really good couple of days. Next year the conference will be running from the 4th to the 5th of September, so for anyone that can’t make FOSS4G and wants to give OSGIS a try, now’s your chance!

 

Conference Organisation for Beginners

I’ve been attending the AGI GeoCommunity Conference here in the UK for a few years now- and this year the AGI kindly asked me if I would sit on the working group for organising GeoCommunity 2011. Being completely new to conference organisation, and wanting to get some experience for the glorious day when OSGeo:UK holds FOSS4G in the UK, I jumped at the chance. This year’s event takes place from September 20-22nd, in Nottingham (a departure from previous years, where it has been in Stratford-upon-Avon),  but the working group has met a couple of times already to get things organised. To be honest, the AGI team themselves do most of the hard work, along with the Conference Chair, but the working group decides on things like keynote and plenary speakers, assesses the papers, and decides on really important things like the theme for the party. At the event itself, I understand we have the exciting business of stuffing all the conference bags with flyers, as well as being visible through the event to help people out, moderate sessions, keep speakers to time etc.

Last week we all met in Nottingham to work through the paper selection.  This year, around 80 abstracts were received, for approximately 50 slots. The AGI uses a blind marking process for selecting papers, so we all received the abstracts with the names and any organisational details removed and had to rank them in order. This is remarkably hard to do! It’s quite easy to identify the best and the worst papers, but deciding on the relative merits of (say) papers 53-67 is very difficult. It’s also hard to be objective about this kind of thing- everyone has their own particular likes and dislikes, and their own area of expertise. However, with a working group that represents a diverse range of interests, we did end up with a reasonable consensus at the end of this process. After the blind marking, considerably more paper shuffling took place to get a balanced set of conference streams.  Grouping papers into coherent sessions and balancing out speakers was probably the hardest part of the whole process (yes, by now we knew the authors names!). The whole process was a lot of fun, including the occasional acts of sabotage as papers were (literally) stolen from one stream to go into another.

In a completely non-scientific assessment of the abstracts- “openness” was reasonably popular, although perhaps more from an open/crowd sourced data perspective rather than open source software. In the final programme, however, open source software gets a mention in a number of papers spread across pretty much all of the streams. With hindsight I’m happy that this is the right approach as it avoids ghetto-ising open source solutions rather than presenting them as viable solutions to every day problems. The whole open/crowd-sourced data debate does get its own stream though, as it’s such a popular topic at the moment.

All in all, I have to say I’m in awe of the AGI staff who make all of this look so easy. I’m also really looking forward to the event, as the programme looks really good, and the new venue should be fantastic. If you’re interested, early bird bookings are available till the end of July. For those that know about the now infamous AGI soap-box georant- it’s new location will be superb…

 

 

« Previous PageNext Page »