Print Page | Close Window

Google Crawl Errors Relating to SEO URL rewrite?

Printed From: ProductCart E-Commerce Solutions
Category: ProductCart
Forum Name: Using ProductCart
Forum Description: Running your store with ProductCart
URL: https://forum.productcart.com/forum_posts.asp?TID=4338
Printed Date: 04-December-2024 at 9:41pm
Software Version: Web Wiz Forums 12.04 - http://www.webwizforums.com


Topic: Google Crawl Errors Relating to SEO URL rewrite?
Posted By: avalight
Subject: Google Crawl Errors Relating to SEO URL rewrite?
Date Posted: 25-February-2011 at 6:39pm
Hello
I am running the new V4.1sp1 apparel add-on and have the Keyword Rich URL setting turned on so the user-friendly URLs are displayed. The other day I was on my google webmaster tools and noticed under Crawl Errors, that all my product pages are showing an 404 error.  Has anyone else looked at this on their google account?  It has me concerned, but I don't know if it is a problem or not and whether I should be turning off this feature and recreating my sitemap.xml file and my storemap.asp file with the normal query strings.

Anyone know about this and/or is there something I am doing wrong.

Thanks
Curt



-------------
Curt



Replies:
Posted By: Brett
Date Posted: 25-February-2011 at 6:57pm
Hey Curt. Could you post a couple of the crawl errors here so I can have a better idea of exactly what's happening? There are many things which can lead to a 404 error.


Posted By: avalight
Date Posted: 25-February-2011 at 7:32pm
Here is a whole file full of them.
Each one is product page, and if you paste the link into your browser, you get 404 error.  However, you go through the category links through the website, like a user would go, these very same links work fine.


uploads/667/Web_crawl_errors.txt - uploads/667/Web_crawl_errors.txt




-------------
Curt


Posted By: Brett
Date Posted: 25-February-2011 at 8:32pm
The issue appears to be related to case sensitivity.

http://www.avalanche-ranch.com/rusticlighting/PC/Avalanche-Table-Lamp-Bear-304p35080.htm

http://www.avalanche-ranch.com/rusticlighting/pc/Avalanche-Table-Lamp-Bear-304p35080.htm

The second one works and it has a lower-case PC, while the first and broken one has an upper-case PC. It's hard to be sure what would be causing this without checking out your 404.asp and related files like pcSeoLinks.asp


Posted By: Brett
Date Posted: 25-February-2011 at 8:40pm
So, if you just want it to *work* and for those links to forward, you can probably add this to your 404.asp at line 39:

if instr(strQ,"/PC/")>0 then
     strQ = replace(strQ,"/PC/","/pc/")
end if


What that should do is replace any /PC/ with /pc/ before the rest of the redirect is executed. What you might then end up with, however, is duplicate links. Here's a modified version of 404.asp with the change already made:
uploads/1159/404.zip - 404.zip


*edit*

Please post your pcadmin/exportFroogle.asp file here. I bet you'll find something around like 568 which is making it generate those upper-case PC links:


'// SEO Links
                              '// Build Product Link
                              if scSeoURLs<>1 then
                                   strProductURL=SPathInfo & "pc/viewPrd.asp?idproduct=" & intIdProduct & "&idcategory=" & tmpIDCategory
                              else
                                   strProductURL=SPathInfo & "pc/" & removeChars(strProductName1) & "-" & tmpIDCategory & "p" & intIdProduct & ".htm"
                              end if
                              '//


That's how mine looks, and you can see that they're lower case. This is probably the issue, and it would probably be better to edit that file and fix the feed you're submitting to google rather than duplicate the links by accepting both upper and lower case.


Posted By: Hamish
Date Posted: 25-February-2011 at 8:49pm
Hi, case sensitivity usually = Linux hosting. Is this on a cloud based server? Some run virtual Windows on top of Linux from what I understand.

-------------
Editing ProductCart Code?

See http://wiki.earlyimpact.com/developers/editcode" rel="nofollow - WIKI Guidelines for Editing ProductCart's ASP Source Code



Posted By: Brett
Date Posted: 25-February-2011 at 9:00pm
What I want to know is how the feed was generated with an upper-case PC folder in the first place. According to the code on my site, it should always be generated with a lower-case PC. Maybe he has an old or modified version of the file.


Posted By: avalight
Date Posted: 25-February-2011 at 11:05pm
Hello
I have looked at the sitemap.xml file and at my storemap.asp file and the storemap.html file I have on the site none of these files have a capitalized PC.  Do you really think that makes a difference?  This is on a semi-dedicated PC windows based computer.  It is at Tango Hosting.  I have asked them to comment on this as well, they say ProductCart developer problem.


-------------
Curt


Posted By: avalight
Date Posted: 25-February-2011 at 11:14pm
Originally posted by Brett Brett wrote:

What I want to know is how the feed was generated with an upper-case PC folder in the first place. According to the code on my site, it should always be generated with a lower-case PC. Maybe he has an old or modified version of the file.


Brett - what is the name of the file are you referring to here? 


-------------
Curt


Posted By: Brett
Date Posted: 26-February-2011 at 1:21am
Sorry, I had the wrong file - that was the google shopping feed page. Go to genGoogleSiteMapA.asp and find around line 40:

     if Right(SPathInfo,1)="/" then
          SPathInfo=SPathInfo & "pc/"          
     else
          SPathInfo=SPathInfo & "/pc/"
     end if


It should be in lower-case, but if yours is in upper-case that might be the cause of your problem.


Posted By: avalight
Date Posted: 26-February-2011 at 11:52am
Thanks Brett - I checked that out and did not find any capitalized PC.  My folder name is lowercase too.  I will do a search of my site for PC all caps later today. 

-------------
Curt


Posted By: TangoHosting
Date Posted: 26-February-2011 at 8:10pm
Originally posted by Brett Brett wrote:

The issue appears to be related to case sensitivity.

http://www.avalanche-ranch.com/rusticlighting/PC/Avalanche-Table-Lamp-Bear-304p35080.htm

http://www.avalanche-ranch.com/rusticlighting/pc/Avalanche-Table-Lamp-Bear-304p35080.htm

The second one works and it has a lower-case PC, while the first and broken one has an upper-case PC. It's hard to be sure what would be causing this without checking out your 404.asp and related files like pcSeoLinks.asp


Bret’s theory seems to be correct, the issues is PC needs to be low case in the URLs.  We have confirmed this on another client’s ProductCart website with the 404 SEO implementation on a separate web server than Curt’s avalanche-ranch.com (Both servers are Windows 2003 SP2 / IIS 6) .

As noted, we recommend updating your ProductCart system code to generate URLs with pc in lower case.

Tango Hosting
http://www.tangohosting.com




Posted By: whizzinpc
Date Posted: 28-February-2011 at 7:33pm
/PC/ could be somewhere in your template. You may want to download your sites header and footer and any static html pages you may have and look for PC in caps and replace with lowercase... or just update the file that Brett suggested to automatically rewrite to lowercase pc.


Posted By: avalight
Date Posted: 01-March-2011 at 4:00pm
Well, I found the offending /PC/ - it was in my folder structure on my dreamweaver software on my desktop.
Somewhere along the line when i set up the mirrored site I used a capitalized PC.  So have changed that and it updated all the links, so I will wait to see it that clears it up.
Thanks all.


-------------
Curt


Posted By: TangoHosting
Date Posted: 01-March-2011 at 4:43pm
Originally posted by avalight avalight wrote:

Well, I found the offending /PC/ - it was in my folder structure on my dreamweaver software on my desktop.
Somewhere along the line when i set up the mirrored site I used a capitalized PC.  So have changed that and it updated all the links, so I will wait to see it that clears it up.
Thanks all.


Sounds like you found the problems Curt, let us know if there is anything else we do on our end.


-------------
Tango Hosting
http://www.tangohosting.com" rel="nofollow - http://www.tangohosting.com
BBB A+

Authorized ProductCart Hosting
Over 6 Years of Professional ProductCart Hosting Starting at $6.95 Monthly.


Posted By: Rick_N
Date Posted: 04-May-2011 at 7:28pm
Hi there,
I have a similar issue that I thought was related to migrating servers but it's been a while now and I see the same issues as mentioned in this post, except my scPcFolder name is omitted and google shows it as a crawl error.
 
The first link is what Google sees, with the missing scPcFolder, as seen through the Google Webmaster Tools Crawl Errors
http://www.eveningsecrets.com/pc/viewPrd.asp?idproduct=5571 - http://www.eveningsecrets.com/pc/viewPrd.asp?idproduct=5571
 
The next link is the actual link with the scPcFolder
http://www.eveningsecrets.com/lingerie/pc/viewPrd.asp?idproduct=5571 - http://www.eveningsecrets.com/lingerie/pc/viewPrd.asp?idproduct=5571
 
My sitemap has all the correct links as does the store map. My custom 404 appears to work as expected but for the life of me I cannot figure out why it is omitting the scPcFolder.
 
In genGoogleSiteMapA the code seems correct between lines 31 and 47, but again the sitemap confirms this is working fine.
 
Any ideas?
 
Thanks.


-------------
EveningSecrets Lingerie...what 'every body' wants
http://www.eveningsecrets.com - EveningSecrets Lingerie


Posted By: TangoHosting
Date Posted: 04-May-2011 at 8:32pm

Hello Rick_N

Your issue does not appear to be related to Curt’s which was a coding issue.

Please ensure you are calling the correct sitemap file in your webmaster tools account. For example, the sitemap URL below seems to include the correct URLs:

http://www.eveningsecrets.com/sitemap.xml - http://www.eveningsecrets.com/sitemap.xml

Google is indexing the 404 rewrite URLs:

Google.com > Search for site:www.eveningsecrets.com



-------------
Tango Hosting
http://www.tangohosting.com" rel="nofollow - http://www.tangohosting.com
BBB A+

Authorized ProductCart Hosting
Over 6 Years of Professional ProductCart Hosting Starting at $6.95 Monthly.


Posted By: Rick_N
Date Posted: 05-May-2011 at 3:02pm
Hi TH,
thanks for the response. I only have the one siteMap.xml in the webmaster tools account. The URL that it is pointing to is correct, ie http://www.eveningsecrets.com/siteMap.xml - www.eveningsecrets.com/siteMap.xml . As you can see in the image provided everything seems correct.
 
 
Unless I am misunderstanding your point, I believe everything there is OK.
 
Thanks.
Rick


-------------
EveningSecrets Lingerie...what 'every body' wants
http://www.eveningsecrets.com - EveningSecrets Lingerie


Posted By: TangoHosting
Date Posted: 05-May-2011 at 3:58pm
Hello Rick,

Please supply a screen-shot of the crawl errors.

Thank you,


-------------
Tango Hosting
http://www.tangohosting.com" rel="nofollow - http://www.tangohosting.com
BBB A+

Authorized ProductCart Hosting
Over 6 Years of Professional ProductCart Hosting Starting at $6.95 Monthly.


Posted By: Rick_N
Date Posted: 05-May-2011 at 5:29pm
Hi TH,
 
OK as requested here is an image of the first few... a full csv file should be attached here as well.
 
Thanks.
 
 
http://www.earlyimpact.com/forum/uploads/239/Web_crawl_error404.zip - uploads/239/Web_crawl_error404.zip
 
You will see there are a few in the list with the correct folder structure (/lingerie/pc/) although the majority of them have the /lingerie/ omitted. The ones with the correct structure I am not worried about as I typically see three or four in there occasionally.


-------------
EveningSecrets Lingerie...what 'every body' wants
http://www.eveningsecrets.com - EveningSecrets Lingerie


Posted By: TangoHosting
Date Posted: 05-May-2011 at 9:46pm
Hello Rick,

You're robots.txt is restricting Google from crawling your web site.

http://www.eveningsecrets.com/robots.txt

In your robots.txt, update:

User-Agent: *
Disallow: /cleanStore/
Disallow: /lingeriev4/
Disallow: /testStore/
Disallow: /store/

To allow all robots to access your site:

User-agent: *
Disallow:

Thank you


-------------
Tango Hosting
http://www.tangohosting.com" rel="nofollow - http://www.tangohosting.com
BBB A+

Authorized ProductCart Hosting
Over 6 Years of Professional ProductCart Hosting Starting at $6.95 Monthly.


Posted By: Rick_N
Date Posted: 06-May-2011 at 2:51pm
Hi again,
actually it's only restricting those folders you see. I get an average of 180 pages crawled per day from different bots. I don't think the problem is there. The robots.txt file is only disregarding those folders in the list. It still crawls the scPcFolder, which is /lingerie/.
 
I have a page rank of 3 and am listed on page one on several keyword searches. Again, unless I am misunderstanding, the robots.txt file is working as expected.


-------------
EveningSecrets Lingerie...what 'every body' wants
http://www.eveningsecrets.com - EveningSecrets Lingerie


Posted By: TangoHosting
Date Posted: 06-May-2011 at 4:26pm
Hello Rick,

You're right. Did you create sitemaps or back door pages with the 404 error links? Try searching your site files for the 404 links.

Thank you




-------------
Tango Hosting
http://www.tangohosting.com" rel="nofollow - http://www.tangohosting.com
BBB A+

Authorized ProductCart Hosting
Over 6 Years of Professional ProductCart Hosting Starting at $6.95 Monthly.


Posted By: Hamish
Date Posted: 06-May-2011 at 4:51pm
Hi,  You could also go to Google and enter link:one-of-the-offending-urls

That should list the page(s) that link to the problem url's, if they were found by a googlebot. 


-------------
Editing ProductCart Code?

See http://wiki.earlyimpact.com/developers/editcode" rel="nofollow - WIKI Guidelines for Editing ProductCart's ASP Source Code



Posted By: Rick_N
Date Posted: 07-May-2011 at 10:13am
Originally posted by TangoHosting TangoHosting wrote:

Hello Rick,

You're right. Did you create sitemaps or back door pages with the 404 error links?
 
Thanks again for the help folks!
I'm not sure I understand what you mean with that. I have not not done anything with the error links...they are just there Confused
 
If I check the link as mentioned in google I come up with no pages found; ie link:http://www.eveningsecrets.com/pc/Brazilian-Bikini-4p1732.htm
which is a page that one of the errors are linked from.
 
I have checked all my internal links that would relate to the product list and do not see anything odd. The storemap has all the correct links as well. I'll go through the site again, but it appears the link is being generated somewhere, as any of the pages that link to the offending URL's do not exist on my site.
 
Thanks.
Rick


-------------
EveningSecrets Lingerie...what 'every body' wants
http://www.eveningsecrets.com - EveningSecrets Lingerie


Posted By: Rick_N
Date Posted: 07-May-2011 at 4:38pm
I found one file in my root folder called index.html, which is not used. There was a link in there as follows:
http://www.eveningsecrets.com/pc/default.asp - http://www.eveningsecrets.com/pc/default.asp
Is it possible the bot crawled this link and continued through creating the links without the scPcFolder? It's the only thing left I can think of. I removed the page altogether and I guess I'll have to wait and see.


-------------
EveningSecrets Lingerie...what 'every body' wants
http://www.eveningsecrets.com - EveningSecrets Lingerie


Posted By: TangoHosting
Date Posted: 07-May-2011 at 5:35pm
Originally posted by Rick_N Rick_N wrote:

I found one file in my root folder called index.html, which is not used. There was a link in there as follows:
http://www.eveningsecrets.com/pc/default.asp - http://www.eveningsecrets.com/pc/default.asp
Is it possible the bot crawled this link and continued through creating the links without the scPcFolder? It's the only thing left I can think of. I removed the page altogether and I guess I'll have to wait and see.


Hello Rick,

It seem very unlikely that google WMT would create and attempt to crawl links that do not exist. I'm pretty sure that there is a file or files that are/were referencing the non-existent URLs in your web site.

Bottom-line: The issue is not related to the default ProductCart system code. You may find further assistance through the google webmaster tools group:

http://groups.google.com/group/Google_Webmaster_Help-Tools/topics - http://groups.google.com/group/Google_Webmaster_Help-Tools/topics

Thank you


-------------
Tango Hosting
http://www.tangohosting.com" rel="nofollow - http://www.tangohosting.com
BBB A+

Authorized ProductCart Hosting
Over 6 Years of Professional ProductCart Hosting Starting at $6.95 Monthly.


Posted By: Rick_N
Date Posted: 08-May-2011 at 4:12pm
Hi TH,
no I didn't think it had anything to do with with the PC code, but was hoping this may have been obvious to someone. Everything was fine before I migrated over to a new server. I checked to make sure I was returning a 404 status code and everything appears to be setup as per the SEO instructions(PC WIKI).
 
I'll keep hunting for the file(s) and if I ever figure it out I'll let you know.
 
Thanks again for the help.


-------------
EveningSecrets Lingerie...what 'every body' wants
http://www.eveningsecrets.com - EveningSecrets Lingerie


Posted By: TangoHosting
Date Posted: 08-May-2011 at 4:19pm
Hi Rick,

You're welcome, sorry we weren't able to point you to the solution.

Good luck




-------------
Tango Hosting
http://www.tangohosting.com" rel="nofollow - http://www.tangohosting.com
BBB A+

Authorized ProductCart Hosting
Over 6 Years of Professional ProductCart Hosting Starting at $6.95 Monthly.


Posted By: Rick_N
Date Posted: 15-June-2011 at 6:34pm
OK, just had to rehash this one more time. I finally got around to checking out what is gong on. It appears that any full URL, an absolute link, that I have in my pages shows corrrectly. If I use a relative link, such as in my header file for images, the link shows up bad. Does not associate the directories above it.
 
Has this to do with Parent Paths Enabled or Disabled? The server I am on has parent paths enabled and of course all the files I have for PC are PPE files. I attached another two images of some of the files that show as broken links.
 
 
 
 
Thanks in advance for any information.
 
 


-------------
EveningSecrets Lingerie...what 'every body' wants
http://www.eveningsecrets.com - EveningSecrets Lingerie


Posted By: ProductCart
Date Posted: 15-June-2011 at 6:39pm
You must have those links somewhere. Do you have any page that is outside of the "lingerie/pc/" folder? If so, that's probably the issue.

Otherwise: are you submitting a sitemap? Review the URLs included in the sitemap. If there is an issue, make sure that the code has not been altered, and that the "includes/productcartfolder.asp" file includes the correct folder name.

-------------
The ProductCart Team

Home of ProductCart http://www.productcart.com" rel="nofollow - shopping cart software


Posted By: Rick_N
Date Posted: 15-June-2011 at 6:49pm
Hi and thanks for the quick response.
I have gone through every single page twice to find any offending links. I cannot find anything.
The includes folder is correct and the sitemap.xml generates perfect links, as does all the other navigation links.
This is an image from my header file
 
 
So if you look at the image in my prior post, any link that I have hardcoded in the file shows correct. The images, where I only reference the image directory where the file is located, ie images/mcard.gif shows up as a bad link in the image in the prior post.
 
Not sure what else you can think of but thanks again!


-------------
EveningSecrets Lingerie...what 'every body' wants
http://www.eveningsecrets.com - EveningSecrets Lingerie


Posted By: whizzinpc
Date Posted: 15-June-2011 at 7:24pm
Our webmaster tools reported broken links that we couldn't find anywhere... I later found out those links are coming from other sites that are linking to us...the problem i think is that googlebot keeps on trying to browse the site with the broken url. The absolute URL's are obviously going to work..... but the relative URL's will continue to be broken.
That's my theory. Do a google search to see if you can find if a site that has an incorrect link to you. We've fixed these using isapi to redirect the original broken link to a fixed one. An example is... a link would come in missing part of the url... in your case its missing the lingerie part
http://www.eveningsecrets.com/pc/Brazilian-Bikini-4p1732.ht
if your links are relative...then it will never put the lingerie back... you need to use ISAPI to automatically add the lingerie back to the url. This is how we corrected the issue. Here is what our code looks like. This is with Rewrite v2. Not sure if it works on the latest version. Basically we are redirecting any url that is completely missing /productcart/pc/, only missing /productcart/ or only missing /pc/ to the correct structure.

RewriteRule /([^./]+\.htm) http://www.abc.com/productcart/pc/$1 [I,RP]
RewriteRule /productcart/([^./]+)\.htm http\://www.abc.com/productcart/pc/$1.htm [I,RP]
RewriteRule /pc/([^./]+)\.htm http\://www.abc.com/productcart/pc/$1.htm [I,RP]




Posted By: Rick_N
Date Posted: 15-June-2011 at 7:43pm
Fantastic information! That sounds like a great route to start from. I'll hunt down external links from other sites and see what is going on. In the meantime I'll fire up IIS and do the rewrites as you mentioned. Thanks for going the extra mile and providing the string.
Thanks again for your valuable input and time.


-------------
EveningSecrets Lingerie...what 'every body' wants
http://www.eveningsecrets.com - EveningSecrets Lingerie



Print Page | Close Window

Forum Software by Web Wiz Forums® version 12.04 - http://www.webwizforums.com
Copyright ©2001-2021 Web Wiz Ltd. - https://www.webwiz.net