Welcome! Log In Create A New Profile

Advanced

How to do substitions (like perl s/// operator) in rewrites?

Posted by bartschipper 
How to do substitions (like perl s/// operator) in rewrites?
January 10, 2010 09:08AM
I recently migrated a web site from Apache to Nginx (and from CMS-system x to CMS system y)
Almost all of the rewrites are OK in the new CMS, except for one older class of article URLs:
http://example.com/News/Articlepage-News/This-is-the-best-News-EVER.htm (example)

The direct mapping for these literal urls to the new urls is lost.
I do however have a file (200k+ entries) that maps the titles to article-id's like so:
# old-class-urls.txt: (first two commented lines are not actually in the file)
# all lowercase title without spaces and hyphens     article-id
thisisthebestnewsever                                123456;
...
In apache I used a rewritemap to rewrite these urls:
# httpd.conf:
...
RewriteMap articles prg:/etc/httpd/rewrites/old-class-article-urls.pl
...
# old-class-article-urls.pl:

#!/usr/bin/perl

$| = 1;

###############################################
# code to be executed at startup of webserver #
###############################################

open(TEXTFILE,"</etc/httpd/rewrites/old-class-urls.txt");      # loading the rewrite map in memory
@lines = <TEXTFILE>;
close TEXTFILE;

foreach $line (@lines) {                                    # load the data in an associative array for fast lookup
   ($keyword,$article_id) = split(/\s+/,$line);
   $keys{$keyword} = $article_id;
}

##########################################
# code to be once every URL is requested #
##########################################

while (<STDIN>) {
   $url = $_;
   chomp($url);
   if($url =~ /\/([^\/]+)\.htm$/) {                         # a match could be made
      $keyword = lc($1);
      $keyword =~ s/-//g;
      if($keys{$keyword}) {
         print  'old-class-url.php?articleid=' . $keys{$keyword}."\n";
         next;
      }
   }
   print "Not found\n";                                     # no match could be made
}
I was hoping something similar or even easier could be done with Nginx:

map $uri $old-class-url {
        include /etc/nginx/rewrites/old-class-urls.txt;
}
...
server {
...
    location ~* ^/News/Articlepage-News/.*htm { 
        rewrite ^/News/Articlepage-News/(.*)htm $1 ;
###
### change $uri to lowercase and remove the hyphens...
### I am looking for something equivalent like in perl:
### s/-//g;
### s/.*/\L{$1}/;
###
        if ($old-class-url) {
                rewrite  ^    /old-class-url.php?articleid=$old-class-url   permanent;
        }
    }
}

I have seen questions about similar functionality in the forums, but not with a solution:
http://forum.nginx.org/read.php?2,34788
http://forum.nginx.org/read.php?9,2511

Is it possible to solve this issue this way or do you recommend a different solution?

Thanks in advance,
Bart
Hello!

On Sun, Jan 10, 2010 at 09:08:48AM -0500, bartschipper wrote:

> I recently migrated a web site from Apache to Nginx (and from CMS-system x to CMS system y)
> Almost all of the rewrites are OK in the new CMS, except for one older class of article URLs:
> http://example.com/News/Articlepage-News/This-is-the-best-News-EVER.htm (example)
>
> The direct mapping for these literal urls to the new urls is lost.
> I do however have a file (200k+ entries) that maps the titles to article-id's like so:
>
> # old-class-urls.txt: (first two commented lines are not actually in the file)
> # all lowercase title without spaces and hyphens article-id
> thisisthebestnewsever 123456;
> ...

[...]

> map $uri $old-class-url {
> include /etc/nginx/rewrites/old-class-urls.txt;
> }
> ...
> server {
> ...
> location ~* ^/News/Articlepage-News/.*htm {
> rewrite ^/News/Articlepage-News/(.*)htm $1 ;
> ###
> ### change $uri to lowercase and remove the hyphens...
> ### I am looking for something equivalent like in perl:
> ### s/-//g;
> ### s/.*/\L{$1}/;

There is no easy way to do this without perl as of now. With
embedded perl it's trivial though.

Maxim Dounin

_______________________________________________
nginx mailing list
nginx@nginx.org
http://nginx.org/mailman/listinfo/nginx
Re: How to do substitions (like perl s/// operator) in rewrites?
January 10, 2010 12:09PM
hi

the map module is case insensitive, it uses ngx_hash_strlow to produce the key internally so this is not a problem

for the hypens problem i would generate a map file with two keys for each article id, one with and one without hyphens like:

title-1 1
title1 1

cheers, bernd
Re: How to do substitions (like perl s/// operator) in rewrites?
January 11, 2010 03:28AM
Thank you for your responses!

I think Maxim is right and embedded Perl is the way to go.

Bernd's idea would work if there was an easy way to generate a map file with hyphens from the source file.
However the source file has 200k+ entries that look like this:
thisisthebestnewsever                            123456;
expertsadvicetoeatmorefish                       123457;
...
I can not think of an automated way to convert this to:
this-is-the-best-news-ever                       123456;
thisisthebestnewsever                            123456;
experts-advice-to-eat-more-fish                  123457;
expertsadvicetoeatmorefish                       123457;
...
Knowing that the map module is case-insensitive may safe me from confusion in the future.

Thanks again,
Bart
Re: How to do substitions (like perl s/// operator) in rewrites?
January 11, 2010 04:18PM
Issue solved!

I followed Maxim's advice and compiled embedded perl in. However, I found the configuration not as trivial as Maxim claimed: embedded perl documentation is scarce and examples are rare.

The following configuration solved it for me and it may serve as an example for others of using the map module together with embedded perl:
http {
...

# use the map module to include a list with keys and corresponding article-ids 
  map $uri $old-class-url {
    include /etc/nginx/rewrites/old-class-urls.txt; # (see the first post for a sample of the contents)
  }

# lower case uri's file name part and remove dashes with perl. Return result to nginx variable $old_uri
  perl_set $old_uri 'sub {
    my $r = shift;
    my $uri = $r->uri;
    if($uri =~ /\/([^\/]+)\.htm$/) {
        $uri = lc($1);
        $uri =~ s/-//g;
    }
    return $uri;
  }';
...
  server {
...
    location ~* ^/News/Articlepage-News/.*htm$ {
# rewrite $uri to variable $old_uri
      rewrite ^ $old_uri ;
# return the article-id in the rewrite if $uri appears in the map
      if ($old-class-url) {
        rewrite  ^    /old-class-url.php?articleid=$old-class-url   permanent;
      }
    }
...
  }
}

I welcome any suggestion for a simpler solution.

Bart Schipper
Sorry, only registered users may post in this forum.

Click here to login

Online Users

Guests: 161
Record Number of Users: 8 on July 14, 2010
Record Number of Guests: 235 on June 28, 2010
Powered by nginx    Powered by FreeBSD    PHP Powered    Powered by MySQL