Advanced search tricks/techniques

Sometimes you gotta do a little digging.

You might get through searching through every page you can find of a website and still end up with nothing useful. Unfortunately that's gonna happen sometimes, because more people are starting to strip the metadata/EXIF data from their photos before their upload, for various reasons. Sometimes it's done deliberately to prevent people from being able to see their editing methods. Another common reason is the photographer uses an app/plug-in like TinyPixel to compress their images for SEO or space saving purposes.

The good news is that there are often times old images or entire galleries that are left untouched, and if you can find them, they'll usually have the metadata that you need.

Wordpress means good news

If the photographer built their website with Wordpress then you have a very good chance of finding photos you can use. This can easily be checked by right-clicking on an image and clicking Open image in new tab and checking the URL of the photo. For 99% of Wordpress-based websites the image url will say /wp-content/ immediately after the base domain (i.e. johnsmithphoto.com/wp-content/blahblah). This alone should give you a good chance of getting what you need without doing anything fancy.

Find the blog section of the website, if they have one, and start opening a bunch of blog posts in new tabs and start checking them. The farther back you go the more likely you're going to find helpful images.

Check original files!

Wordpress by default will automatically generate a handful of various sizes of every image you upload and display one of those instead of the original uploaded file. Sometimes these smaller images will contain the same metadata as the orginal and sometimes they wont, so It can be beneficial to open some images in a new tab and check the URL to make sure you're looking at the original file. If you're not, you'll need to remove the end of the URL that indicates that you're looking at the resized version. For example:

// https://johnsmith.com/wp-content/uploads/wedding/chris_amy_denver245-640x900.jpg

Becomes

// https://johnsmith.com/wp-content/uploads/wedding/chris_amy_denver245.jpg        

Search for "hidden" pages

For any website to appear in Google's search results, Google will need to have that website in it's search index, and in most cases this index will also contain links/info for every public page on that website. This can be good news for you, because a lot of times photographers will "unlink" gallery pages/blog posts from being accessible from their website's menu, but the actual pages will still exist and can be viewed if you have the URL, which will sometimes still remain in Google's index for a website for many months even if it's been hidden from the website.

I've built a handy tool into PresetNinja to easily check Google's index for a given site, which might turn up something helpful that you couldn't otherwise find browsing the site like normal.

Simply navigate to the website that you want to check and then open your PresetNinja menu and click the "Go" button next to Find hidden pages.

You'll be taken to a page where you can input the URL of the website that you want to search. Once you do do that and hit the submit button, that will send you to a Google search that looks something like this, and hopefully it will have some pages you couldn't see in your initial search.

Archived web pages (massive hassle)

Works best for Squarespace hosted sites.

If a website gets enough traffic, it will inevitably be archived by archive.org. Some websites might have many snapshots taken over the months/years, giving you a lot of chances to maybe find something helpful. Some websites might have 1 snapshot, or none, it's hit or miss, really, but the more popular the photographer, the more likely that it's been archived at least once.

These archived sometimes will have photos in the archive that you can see, and sometimes not (meaning a mostly blank page where photos should be), but even when it does, the metadata that you need will never be in these archived photos. This is where the hassle comes in. Even when there are no visible photos, the URLs to all the images are still in the page's source code, which you can easily see by right clicking on the page and clicking Inspect. You'll notice all the image URLs from the original website are in there, but they will all have a bunch of archive.org junk before it, which you will have to remove to get the original URL to check. Here is an example URL you might see.

https://web.archive.org/web/20230311213306/https://images.squarespace-cdn.com/content/v1/536b7912e03a731bd9a6d6/15758917473-C1RVNBU4530CK1BAX0/C-J-WEBB-680.jpg

You'll notice there is a full Squarepace image URL within that longer URL, and you'll just need to delete the archive.org bit at the front so that you're left with

https://images.squarespace-cdn.com/content/v1/536b7912e03a731bd9a6d6/15758917473-C1RVNBU4530CK1BAX0/C-J-WEBB-680.jpg

This is where Squarespace is helpful. If you're checking a Squarespace site, basically any image URL you check will still exist and be checkable even if the page it was on has been gone for years, or even if the website has expired and is not being paid for and hosted on Squarespace anymore. At least at the time of writing this, Squarespace just never deletes their images like ever.

I'll make a more in-depth guide to this soon, but in the mean time, my advice is this:

If you're looking at an archived blog post, it's generally safe to grab one URL from the source code and check it, and if tit doesn't contain and metadata, it's very likely that the entire blog post is like that, and not worth checking.

If it's a gallery page that contains, or is likely to contain a lot of various images taken on many different days, it's probably worth checking all the URLs on the page. This will take a long time and be a huge hassle to do, but here is the easiest way.

  1. Right click the page and click Inspect to view the source code.

  2. Find the beginning of the <body> section, right click it and click copy -> copy outer HTML

  3. Extract the URLs from the code with one of these websites

  4. Alphabetize the list of URLS with one of these websites

  5. You are now left with a list of URLs that you'll need to edit to separate out the unnecessary archive.org bit (like I explained above) to be able to check the valid image URLs. Shoot me an email or chat message on presetninja.com if you need any help, and may God have mercy on your soul.

Last updated