Google Crawl Errors Relating to SEO URL rewrite?
Printed From: ProductCart E-Commerce Solutions
Category: ProductCart
Forum Name: Using ProductCart
Forum Description: Running your store with ProductCart
URL: https://forum.productcart.com/forum_posts.asp?TID=4338
Printed Date: 04-December-2024 at 9:41pm Software Version: Web Wiz Forums 12.04 - http://www.webwizforums.com
Topic: Google Crawl Errors Relating to SEO URL rewrite?
Posted By: avalight
Subject: Google Crawl Errors Relating to SEO URL rewrite?
Date Posted: 25-February-2011 at 6:39pm
Hello I am running the new V4.1sp1 apparel add-on and have the Keyword Rich URL setting turned on so the user-friendly URLs are displayed. The other day I was on my google webmaster tools and noticed under Crawl Errors, that all my product pages are showing an 404 error. Has anyone else looked at this on their google account? It has me concerned, but I don't know if it is a problem or not and whether I should be turning off this feature and recreating my sitemap.xml file and my storemap.asp file with the normal query strings.
Anyone know about this and/or is there something I am doing wrong.
Thanks Curt
------------- Curt
|
Replies:
Posted By: Brett
Date Posted: 25-February-2011 at 6:57pm
Hey Curt. Could you post a couple of the crawl errors here so I can have a better idea of exactly what's happening? There are many things which can lead to a 404 error.
|
Posted By: avalight
Date Posted: 25-February-2011 at 7:32pm
Here is a whole file full of them. Each one is product page, and if you paste the link into your browser, you get 404 error. However, you go through the category links through the website, like a user would go, these very same links work fine.
uploads/667/Web_crawl_errors.txt - uploads/667/Web_crawl_errors.txt
------------- Curt
|
Posted By: Brett
Date Posted: 25-February-2011 at 8:32pm
The issue appears to be related to case sensitivity.
http://www.avalanche-ranch.com/rusticlighting/PC/Avalanche-Table-Lamp-Bear-304p35080.htm
http://www.avalanche-ranch.com/rusticlighting/pc/Avalanche-Table-Lamp-Bear-304p35080.htm
The second one works and it has a lower-case PC, while the first and broken one has an upper-case PC. It's hard to be sure what would be causing this without checking out your 404.asp and related files like pcSeoLinks.asp
|
Posted By: Brett
Date Posted: 25-February-2011 at 8:40pm
So, if you just want it to *work* and for those links to forward, you can probably add this to your 404.asp at line 39:
if instr(strQ,"/PC/")>0 then
strQ = replace(strQ,"/PC/","/pc/")
end if
|
What that should do is replace any /PC/ with /pc/ before the rest of the redirect is executed. What you might then end up with, however, is duplicate links. Here's a modified version of 404.asp with the change already made:
uploads/1159/404.zip - 404.zip
*edit*
Please post your pcadmin/exportFroogle.asp file here. I bet you'll find something around like 568 which is making it generate those upper-case PC links:
'// SEO Links
'// Build Product Link
if scSeoURLs<>1 then
strProductURL=SPathInfo & "pc/viewPrd.asp?idproduct=" & intIdProduct & "&idcategory=" & tmpIDCategory
else
strProductURL=SPathInfo & "pc/" & removeChars(strProductName1) & "-" & tmpIDCategory & "p" & intIdProduct & ".htm"
end if
'//
|
That's how mine looks, and you can see that they're lower case. This is probably the issue, and it would probably be better to edit that file and fix the feed you're submitting to google rather than duplicate the links by accepting both upper and lower case.
|
Posted By: Hamish
Date Posted: 25-February-2011 at 8:49pm
Hi, case sensitivity usually = Linux hosting. Is this on a cloud based server? Some run virtual Windows on top of Linux from what I understand.
------------- Editing ProductCart Code?
See http://wiki.earlyimpact.com/developers/editcode" rel="nofollow - WIKI Guidelines for Editing ProductCart's ASP Source Code
|
Posted By: Brett
Date Posted: 25-February-2011 at 9:00pm
What I want to know is how the feed was generated with an upper-case PC folder in the first place. According to the code on my site, it should always be generated with a lower-case PC. Maybe he has an old or modified version of the file.
|
Posted By: avalight
Date Posted: 25-February-2011 at 11:05pm
Hello I have looked at the sitemap.xml file and at my storemap.asp file and the storemap.html file I have on the site none of these files have a capitalized PC. Do you really think that makes a difference? This is on a semi-dedicated PC windows based computer. It is at Tango Hosting. I have asked them to comment on this as well, they say ProductCart developer problem.
------------- Curt
|
Posted By: avalight
Date Posted: 25-February-2011 at 11:14pm
Brett wrote:
What I want to know is how the feed was generated with an upper-case PC folder in the first place. According to the code on my site, it should always be generated with a lower-case PC. Maybe he has an old or modified version of the file. |
Brett - what is the name of the file are you referring to here?
------------- Curt
|
Posted By: Brett
Date Posted: 26-February-2011 at 1:21am
Sorry, I had the wrong file - that was the google shopping feed page. Go to genGoogleSiteMapA.asp and find around line 40:
if Right(SPathInfo,1)="/" then
SPathInfo=SPathInfo & "pc/"
else
SPathInfo=SPathInfo & "/pc/"
end if
|
It should be in lower-case, but if yours is in upper-case that might be the cause of your problem.
|
Posted By: avalight
Date Posted: 26-February-2011 at 11:52am
Thanks Brett - I checked that out and did not find any capitalized PC. My folder name is lowercase too. I will do a search of my site for PC all caps later today.
------------- Curt
|
Posted By: TangoHosting
Date Posted: 26-February-2011 at 8:10pm
Brett wrote:
The issue appears to be related to case sensitivity.
http://www.avalanche-ranch.com/rusticlighting/PC/Avalanche-Table-Lamp-Bear-304p35080.htm
http://www.avalanche-ranch.com/rusticlighting/pc/Avalanche-Table-Lamp-Bear-304p35080.htm
The second one works and it has a lower-case PC, while the first and broken one has an upper-case PC. It's hard to be sure what would be causing this without checking out your 404.asp and related files like pcSeoLinks.asp |
Bret’s theory seems to be correct, the issues is PC needs to
be low case in the URLs. We have
confirmed this on another client’s ProductCart website with the 404 SEO implementation
on a separate web server than Curt’s avalanche-ranch.com (Both servers are
Windows 2003 SP2 / IIS 6) .
As noted, we recommend updating your ProductCart system code
to generate URLs with pc in lower case.
Tango Hosting
http://www.tangohosting.com
|
Posted By: whizzinpc
Date Posted: 28-February-2011 at 7:33pm
/PC/ could be somewhere in your template. You may want to download your sites header and footer and any static html pages you may have and look for PC in caps and replace with lowercase... or just update the file that Brett suggested to automatically rewrite to lowercase pc.
|
Posted By: avalight
Date Posted: 01-March-2011 at 4:00pm
Well, I found the offending /PC/ - it was in my folder structure on my dreamweaver software on my desktop. Somewhere along the line when i set up the mirrored site I used a capitalized PC. So have changed that and it updated all the links, so I will wait to see it that clears it up. Thanks all.
------------- Curt
|
Posted By: TangoHosting
Date Posted: 01-March-2011 at 4:43pm
avalight wrote:
Well, I found the offending /PC/ - it was in my folder structure on my dreamweaver software on my desktop. Somewhere along the line when i set up the mirrored site I used a capitalized PC. So have changed that and it updated all the links, so I will wait to see it that clears it up. Thanks all.
|
Sounds like you found the problems Curt, let us know if there is anything else we do on our end.
------------- Tango Hosting http://www.tangohosting.com" rel="nofollow - http://www.tangohosting.com BBB A+
Authorized ProductCart Hosting Over 6 Years of Professional ProductCart Hosting Starting at $6.95 Monthly.
|
Posted By: Rick_N
Date Posted: 04-May-2011 at 7:28pm
Hi there,
I have a similar issue that I thought was related to migrating servers but it's been a while now and I see the same issues as mentioned in this post, except my scPcFolder name is omitted and google shows it as a crawl error.
The first link is what Google sees, with the missing scPcFolder, as seen through the Google Webmaster Tools Crawl Errors
http://www.eveningsecrets.com/pc/viewPrd.asp?idproduct=5571 - http://www.eveningsecrets.com/pc/viewPrd.asp?idproduct=5571
The next link is the actual link with the scPcFolder
http://www.eveningsecrets.com/lingerie/pc/viewPrd.asp?idproduct=5571 - http://www.eveningsecrets.com/lingerie/pc/viewPrd.asp?idproduct=5571
My sitemap has all the correct links as does the store map. My custom 404 appears to work as expected but for the life of me I cannot figure out why it is omitting the scPcFolder.
In genGoogleSiteMapA the code seems correct between lines 31 and 47, but again the sitemap confirms this is working fine.
Any ideas?
Thanks.
------------- EveningSecrets Lingerie...what 'every body' wants
http://www.eveningsecrets.com - EveningSecrets Lingerie
|
Posted By: TangoHosting
Date Posted: 04-May-2011 at 8:32pm
Hello Rick_N
Your issue does not appear to be related to Curt’s which was
a coding issue.
Please ensure you are calling the correct sitemap file in
your webmaster tools account. For example, the sitemap URL below seems to
include the correct URLs:
http://www.eveningsecrets.com/sitemap.xml - http://www.eveningsecrets.com/sitemap.xml
Google is indexing the 404 rewrite URLs:
Google.com > Search for site:www.eveningsecrets.com
------------- Tango Hosting http://www.tangohosting.com" rel="nofollow - http://www.tangohosting.com BBB A+
Authorized ProductCart Hosting Over 6 Years of Professional ProductCart Hosting Starting at $6.95 Monthly.
|
Posted By: Rick_N
Date Posted: 05-May-2011 at 3:02pm
Hi TH,
thanks for the response. I only have the one siteMap.xml in the webmaster tools account. The URL that it is pointing to is correct, ie http://www.eveningsecrets.com/siteMap.xml - www.eveningsecrets.com/siteMap.xml . As you can see in the image provided everything seems correct.
Unless I am misunderstanding your point, I believe everything there is OK.
Thanks.
Rick
------------- EveningSecrets Lingerie...what 'every body' wants
http://www.eveningsecrets.com - EveningSecrets Lingerie
|
Posted By: TangoHosting
Date Posted: 05-May-2011 at 3:58pm
Hello Rick,
Please supply a screen-shot of the crawl errors.
Thank you,
------------- Tango Hosting http://www.tangohosting.com" rel="nofollow - http://www.tangohosting.com BBB A+
Authorized ProductCart Hosting Over 6 Years of Professional ProductCart Hosting Starting at $6.95 Monthly.
|
Posted By: Rick_N
Date Posted: 05-May-2011 at 5:29pm
Hi TH,
OK as requested here is an image of the first few... a full csv file should be attached here as well.
Thanks.
http://www.earlyimpact.com/forum/uploads/239/Web_crawl_error404.zip - uploads/239/Web_crawl_error404.zip
You will see there are a few in the list with the correct folder structure (/lingerie/pc/) although the majority of them have the /lingerie/ omitted. The ones with the correct structure I am not worried about as I typically see three or four in there occasionally.
------------- EveningSecrets Lingerie...what 'every body' wants
http://www.eveningsecrets.com - EveningSecrets Lingerie
|
Posted By: TangoHosting
Date Posted: 05-May-2011 at 9:46pm
Hello Rick,
You're robots.txt is restricting Google from crawling your web site.
http://www.eveningsecrets.com/robots.txt
In your robots.txt, update:
User-Agent: * Disallow: /cleanStore/ Disallow: /lingeriev4/ Disallow: /testStore/ Disallow: /store/
To allow all robots to access your site:
User-agent: * Disallow:
Thank you
------------- Tango Hosting http://www.tangohosting.com" rel="nofollow - http://www.tangohosting.com BBB A+
Authorized ProductCart Hosting Over 6 Years of Professional ProductCart Hosting Starting at $6.95 Monthly.
|
Posted By: Rick_N
Date Posted: 06-May-2011 at 2:51pm
Hi again,
actually it's only restricting those folders you see. I get an average of 180 pages crawled per day from different bots. I don't think the problem is there. The robots.txt file is only disregarding those folders in the list. It still crawls the scPcFolder, which is /lingerie/.
I have a page rank of 3 and am listed on page one on several keyword searches. Again, unless I am misunderstanding, the robots.txt file is working as expected.
------------- EveningSecrets Lingerie...what 'every body' wants
http://www.eveningsecrets.com - EveningSecrets Lingerie
|
Posted By: TangoHosting
Date Posted: 06-May-2011 at 4:26pm
Hello Rick,
You're right. Did you create sitemaps or back door pages with the 404 error links? Try searching your site files for the 404 links.
Thank you
------------- Tango Hosting http://www.tangohosting.com" rel="nofollow - http://www.tangohosting.com BBB A+
Authorized ProductCart Hosting Over 6 Years of Professional ProductCart Hosting Starting at $6.95 Monthly.
|
Posted By: Hamish
Date Posted: 06-May-2011 at 4:51pm
Hi, You could also go to Google and enter link:one-of-the-offending-urls
That should list the page(s) that link to the problem url's, if they were found by a googlebot.
------------- Editing ProductCart Code?
See http://wiki.earlyimpact.com/developers/editcode" rel="nofollow - WIKI Guidelines for Editing ProductCart's ASP Source Code
|
Posted By: Rick_N
Date Posted: 07-May-2011 at 10:13am
TangoHosting wrote:
Hello Rick,
You're right. Did you create sitemaps or back door pages with the 404 error links?
|
Thanks again for the help folks!
I'm not sure I understand what you mean with that. I have not not done anything with the error links...they are just there
If I check the link as mentioned in google I come up with no pages found; ie link:http://www.eveningsecrets.com/pc/Brazilian-Bikini-4p1732.htm
which is a page that one of the errors are linked from.
I have checked all my internal links that would relate to the product list and do not see anything odd. The storemap has all the correct links as well. I'll go through the site again, but it appears the link is being generated somewhere, as any of the pages that link to the offending URL's do not exist on my site.
Thanks.
Rick
------------- EveningSecrets Lingerie...what 'every body' wants
http://www.eveningsecrets.com - EveningSecrets Lingerie
|
Posted By: Rick_N
Date Posted: 07-May-2011 at 4:38pm
I found one file in my root folder called index.html, which is not used. There was a link in there as follows:
http://www.eveningsecrets.com/pc/default.asp - http://www.eveningsecrets.com/pc/default.asp
Is it possible the bot crawled this link and continued through creating the links without the scPcFolder? It's the only thing left I can think of. I removed the page altogether and I guess I'll have to wait and see.
------------- EveningSecrets Lingerie...what 'every body' wants
http://www.eveningsecrets.com - EveningSecrets Lingerie
|
Posted By: TangoHosting
Date Posted: 07-May-2011 at 5:35pm
Rick_N wrote:
I found one file in my root folder called index.html, which is not used. There was a link in there as follows:
http://www.eveningsecrets.com/pc/default.asp - http://www.eveningsecrets.com/pc/default.asp
Is it possible the bot crawled this link and continued through creating the links without the scPcFolder? It's the only thing left I can think of. I removed the page altogether and I guess I'll have to wait and see. |
Hello Rick,
It seem very unlikely that google WMT would create and attempt to crawl links that do not exist. I'm pretty sure that there is a file or files that are/were referencing the non-existent URLs in your web site.
Bottom-line: The issue is not related to the default ProductCart system code. You may find further assistance through the google webmaster tools
group:
http://groups.google.com/group/Google_Webmaster_Help-Tools/topics - http://groups.google.com/group/Google_Webmaster_Help-Tools/topics
Thank you
------------- Tango Hosting http://www.tangohosting.com" rel="nofollow - http://www.tangohosting.com BBB A+
Authorized ProductCart Hosting Over 6 Years of Professional ProductCart Hosting Starting at $6.95 Monthly.
|
Posted By: Rick_N
Date Posted: 08-May-2011 at 4:12pm
Hi TH,
no I didn't think it had anything to do with with the PC code, but was hoping this may have been obvious to someone. Everything was fine before I migrated over to a new server. I checked to make sure I was returning a 404 status code and everything appears to be setup as per the SEO instructions(PC WIKI).
I'll keep hunting for the file(s) and if I ever figure it out I'll let you know.
Thanks again for the help.
------------- EveningSecrets Lingerie...what 'every body' wants
http://www.eveningsecrets.com - EveningSecrets Lingerie
|
Posted By: TangoHosting
Date Posted: 08-May-2011 at 4:19pm
Hi Rick,
You're welcome, sorry we weren't able to point you to the solution.
Good luck
------------- Tango Hosting http://www.tangohosting.com" rel="nofollow - http://www.tangohosting.com BBB A+
Authorized ProductCart Hosting Over 6 Years of Professional ProductCart Hosting Starting at $6.95 Monthly.
|
Posted By: Rick_N
Date Posted: 15-June-2011 at 6:34pm
OK, just had to rehash this one more time. I finally got around to checking out what is gong on. It appears that any full URL, an absolute link, that I have in my pages shows corrrectly. If I use a relative link, such as in my header file for images, the link shows up bad. Does not associate the directories above it.
Has this to do with Parent Paths Enabled or Disabled? The server I am on has parent paths enabled and of course all the files I have for PC are PPE files. I attached another two images of some of the files that show as broken links.
Thanks in advance for any information.
------------- EveningSecrets Lingerie...what 'every body' wants
http://www.eveningsecrets.com - EveningSecrets Lingerie
|
Posted By: ProductCart
Date Posted: 15-June-2011 at 6:39pm
You must have those links somewhere. Do you have any page that is outside of the "lingerie/pc/" folder? If so, that's probably the issue.
Otherwise: are you submitting a sitemap? Review the URLs included in the sitemap. If there is an issue, make sure that the code has not been altered, and that the "includes/productcartfolder.asp" file includes the correct folder name.
------------- The ProductCart Team
Home of ProductCart http://www.productcart.com" rel="nofollow - shopping cart software
|
Posted By: Rick_N
Date Posted: 15-June-2011 at 6:49pm
Hi and thanks for the quick response.
I have gone through every single page twice to find any offending links. I cannot find anything.
The includes folder is correct and the sitemap.xml generates perfect links, as does all the other navigation links.
This is an image from my header file
So if you look at the image in my prior post, any link that I have hardcoded in the file shows correct. The images, where I only reference the image directory where the file is located, ie images/mcard.gif shows up as a bad link in the image in the prior post.
Not sure what else you can think of but thanks again!
------------- EveningSecrets Lingerie...what 'every body' wants
http://www.eveningsecrets.com - EveningSecrets Lingerie
|
Posted By: whizzinpc
Date Posted: 15-June-2011 at 7:24pm
Our webmaster tools reported broken links that we couldn't find
anywhere... I later found out those links are coming from other sites
that are linking to us...the problem i think is that googlebot keeps on
trying to browse the site with the broken url. The absolute URL's are
obviously going to work..... but the relative URL's will continue to be
broken.
That's my theory. Do a google search to see if you can find if a site
that has an incorrect link to you. We've fixed these using isapi to
redirect the original broken link to a fixed one. An example is... a link would come in missing part of the url... in your case its missing the lingerie part
http://www.eveningsecrets.com/pc/Brazilian-Bikini-4p1732.ht | if your links are relative...then it will never put the lingerie back... you need to use ISAPI to automatically add the lingerie back to the url. This is how we corrected the issue. Here is what our code looks like. This is with Rewrite v2. Not sure if it works on the latest version. Basically we are redirecting any url that is completely missing /productcart/pc/, only missing /productcart/ or only missing /pc/ to the correct structure.
RewriteRule /([^./]+\.htm) http://www.abc.com/productcart/pc/$1 [I,RP] RewriteRule /productcart/([^./]+)\.htm http\://www.abc.com/productcart/pc/$1.htm [I,RP] RewriteRule /pc/([^./]+)\.htm http\://www.abc.com/productcart/pc/$1.htm [I,RP]
|
|
Posted By: Rick_N
Date Posted: 15-June-2011 at 7:43pm
Fantastic information! That sounds like a great route to start from. I'll hunt down external links from other sites and see what is going on. In the meantime I'll fire up IIS and do the rewrites as you mentioned. Thanks for going the extra mile and providing the string.
Thanks again for your valuable input and time.
------------- EveningSecrets Lingerie...what 'every body' wants
http://www.eveningsecrets.com - EveningSecrets Lingerie
|
|