Logstools

Free SEO Crawler and Log Analysis tool worth your time (web based) – Seolyzer

Seolyzer impressed me right from the start and while it’s not open source, it’s free and it’s talented owner is committed to offering a free plan in the future. Right now it’s a crawler, log analysis tool, monitors robots.txt and calculates PageRank. It deserves coverage and it’s definitely worth your time; head straight to Seolyzer right now or keep reading to understand more about the tool and it’s talented founder (and 1 man army), Olivier Papon.

Why I like this tool

It’s simple, intuitive and useful for SEOs at any experience level.  The log file collection can be done in real time or via upload, the crawler is quick and the reports are straightforward and extensible (additional data is available in the columns tabs).   It doesn’t feel heavy or clunky and I haven’t seen one single error, which is a feat on it’s own.  The tool is obviously still in its infancy and lacks some features (crawler should render the full DOM – this is definitely in his feature list), but we have to remember that Olivier is doing this on his own – which is, pretty incredible.

About the founder

1)  What’s your background?

I’m 36 years old,  and have 14 years of experience in the SEO world. I got a master’s degree in web projects. I started to work in SEO with my first job, it never stopped.
I launched a free log analyzer and crawler (Seolyzer.io) in November 2017

2) Are you a developer first, SEO second? What languages do you program in?

I had the chance to learn development during my studies, but I’m definitely a SEO first!
Development is just a way to meet a need, in this case SEO needs. PHP, C++ it depends

3) Why did you feel the need to make Seolyzer? Aren’t there already solutions in the SEO space?

As an SEO, I needed live data. When you push live 10k redirections, you have to verify in real time the reaction of Googlebot, you can’t wait for potential errors in Search console, it will be a disaster. No other tool in the market does real time, that’s the first reason. The second one is that others tools are so expensive, they are intended for large budgets.

4) The tool is free, which is a bit unbelievable given you’re probably storing a lot of data. Are you planning to make this a paid service?

Exactly, we process 20M hits per day, that’s a lot a data, a lot of hardware resources and specific developments. Seolyzer.io is in beta for almost 2 years, and must switch naturally to a freemium model, an affordable one.

5) Are you developing this on your own?

Yes, I am! A developer should join me very soon so that can focus on the SEO value of the tool

About the tool

1) About your crawler, are you rendering the DOM? If so, are you using Phantom JS or some other headless browser and how long is your timeout?

Not yet, a lot of other features are in progress. We have plan to use a headless Chrome, to be as close as possible to Google’s process.

2) What is the limit of the crawl? What if I have 10 million pages?

I love this question! I spend a lot of time on this subject, only SaaS crawlers are able to crawl this amount of pages. Our record is 22M pages on one site.

3) Looks like you’re calculating PageRank, which is great, but how are you doing this? As skeptics, SEOs need to know how you’re doing this so we can trust it

Anyway a good SEO has to be skeptical, right? 🙂 That’s very simple, I used the original patent of Google about Page Rank.
The formula can be found right here https://en.wikipedia.org/wiki/PageRank#Iterative

4) I want real time logging, but I don’t want to use PHP to do so. Are you going to provide JavaScript (example for Node JS) or other language solutions?

The best way is to process logs from the server directly with our Seolyzer agent. If Googlebot recognizes that your site is Full-JS (rendering stack), for sure it could be a good solution to develop a specific tag. If not, on a normal site, JS tag will be useless, Googlebot won’t execute it.

5) What log types do you accept? IIS, NginX? What if I have custom logging (what fields do I need to log to upload to your tool and what do the headers need to be, also what format for the timestamp?)

Apache, NginX, IIS, Varnish, HA Proxy, Cloudflare… and even custom if needed 🙂

The  log format should contain :

  • timestamp
  • client IP – (Note from Dave: this should be the original IP)
  • URL (complete or query + domain)
  • status code
  • user-agent
  • referrer
  • response time (optional)
  • protocol or port ( optional )
  • weight

6) Tell us about data security. There are lots of organizations that are fearful of giving log access data. How is the data protected? Can we request for our data to be deleted?

It’s completely legitimate, data have to be secured, especially in the legal actual context. The data is secured and hosted in France on a private network and are collected with SSL. The data is never communicated to third parties. Seolyzer.io is also GDPR compliant, no personal is stored : we anonymize IPs on the fly before storage.

This fear about the logs is in my opinion not justified, there is more to fear with the amount of data sent via JS tags to the companies which do not apply the GDPR (Google, Facebook …) and does profiling on your users.

7) Do you perform reverse DNS checks to ensure you’re collecting actual Googlebot requests?

Yes! Any fake IPs are bucketed into the “Bad bots” section.

About future features

1) Monitoring looks promising. What features will you be adding to this section?

As we already do for the robots.txt modifications, we would like to generalize alerting on all our dashboards. It’s so important to be alerted when something’s going wrong on the website : increase of 500 code status, very slow pages, downtime…

2) What is your favorite aspect of the tool?

Definitely the crawl visualization : in a handful of seconds, we can get an idea of the SEO structure of a site and the problems it encounters in internal linking, with Page Rank calculation… and in addition it is beautifully colorful!

3) What is something you wish you would have done differently or that you don’t like?

The home page of Seolyzer.io : it only shows 20% of what the tool is capable.

4) Any plans for an API? Maybe an API for a single crawl request on one page?

We have a lot of requests about API, the first version will provide KPIs from the data, and why not after, a crawl with a list of url (or single).

5) Plans to integrate Google datasources like GSC or Google Analytics?

I’ll finish on this… sshhh that’s a secret 😉

 

David Sottimano

About David Sottimano

Trying to make OpensourceSeo.org the best free information hub for the SEO industry. Personal Website here