The Upstream Tracker Project
Well into GSoC 2012, Its time I wrote a bit about my project this year. I am working with the openSUSE project again this year on the Upstream Tracker project under the mentorship of Vincent Untz. In this post, I will try to give a brief overview of the tool and it’s working. I will focus on the details in subsequent posts.
The ‘Upstream Tracker’ project aims to be a central hub where the upstream versions of open source softwares can be continually monitored for new version releases. One problem with open source softwares is that it is completely de-centralized. Every project has a place of it’s own. Some are hosted at Sourceforge while others at Google Code and a variety of other places. This de-centralized approach has it’s own advantages, however, it also has it’s downsides. For packagers and package maintainers of communities, it is quite a tedious task to keep note of all the packages they maintain and manually look up for new releases. Also, for minor releases, the code change is minimal enough that there is almost no change in the packaging. However, currently, even for minor releases, package maintainers have to be involved at some scale. The project, at a later stage, would also be able to give a comparison table of the latest version available upstream against the latest version currently available on a particular version of a linux distro. This comparison could be of great value to developers/packagers as well as users.
With the upstream tracker project, we crowd source data from users which help us in looking for upstream versions of a given package. The project is divided into two distinct parts – The Rails Frontend, where the users can submit data and look up at the results and The Python Backend, which does all the heavy lifting using the data from the Rails DB. A user is required to input three key parameters – A package name, the download URL and the method used to process the record/package.
Once the user inputs the parameters, the python backend swings into action by collecting records from the Rails DB and processing them one by one. The download URL is opened and all the links on the page are processed. The links are then filtered and only those that satisfy a few basic requirements (ex: file name should be of the format -.tar.) are kept. Finally, the version strings from the file names are extracted, sorted and the latest version is returned along with the complete location of the file. The record on the Rails DB is then updated to reflect the latest version and the URL of the file.
Currently, these methods are supported :
- HTTPLS – Download files listed as links on a HTTP page
- FTPLS – Download files on FTP
- DualHTTPLS – Download files listed as links on two HTTP pages (One for release and one for testing)
- LP – For projects hosted on Launchpad
- SF – For projects hosted on Sourceforge
- Google – For projects hosted on Google Code
- Trac – Download files hosted using Trac
- SVNLS – Download files on SVN with Web Access
The user can look up the results of the processing on the web interface where packages with errors on them are highlighted in red while those with no errors are highlighted in green. Packages yet to be processed are not highlighted. Also, a separate error page is used to look at erroneous records so that it is easier for people to suggest corrections and update the record.
The form also supports a separate method called Custom, where in the user can enter a custom URL with REGEX, similar to Debian watch files. This forms the base of DEHS imports where in Debian watch files can be directly imported.
The python backend is threaded and would process only X files every few minutes. This allows the load to be spread evenly and hence would require less resources.
The project is being actively worked upon and more features will be added over the course of the next few weeks. So far, the basics are tested and they work fine. Here are a few screenshots to show the workflow.