Remove unused Magento product images

Magento has the nasty habit of leaving unused product image files in it’s var/media directory. If you, like one of our clients, import your catalog at a regular interval using Magento’s Dataflow or similar this will fill up your precious data space. Especially when your catalogue consists of 12.000+ SKUs with one or more product images like was the situation here…

The solution
There is a solution in the form of the Image Clean extension created by Defcon. That extension however cannot be scheduled using cron, and it requires manual interaction. Panticz was probably thinking the same thing, and put together a shell script to clean up the unused Magento product images. However, his script needs a config file containing some variables, also the paths probably need manual adjusting if you want to use it yourself.

A man of little words
Panticz, apparently being a man of little words, didn’t leave an instruction to go with it. So we put together a version of his script that is easier to deploy. It will configure itself if you tell it where your Magento install is located. Just put it somewhere where you can run it, and make sure you change MAGENTO_PATH=”/path/to/your/magento/httpdocs” to the path of your Magento install. It’s suited for use in a cronjob, say once a week.

#!/bin/bash
MAGENTO_PATH="/var/www/path/to/vhost/httpdocs/path/to/magento"
LOG=${MAGENTO_PATH}/var/log/imagecleanup.log
DB_USER=$(sed -n 's|<username><\!\[CDATA\[\(.*\)\]\]></username>|\1|p' ${MAGENTO_PATH}/app/etc/local.xml | tr -d ' ')
DB_PASS=$(sed -n 's|<password><\!\[CDATA\[\(.*\)\]\]></password>|\1|p' ${MAGENTO_PATH}/app/etc/local.xml | tr -d ' ')
DB_NAME=$(sed -n 's|<dbname><\!\[CDATA\[\(.*\)\]\]></dbname>|\1|p' ${MAGENTO_PATH}/app/etc/local.xml | tr -d ' ')
DB_PREFIX=$(sed -n 's|<table_prefix><\!\[CDATA\[\(.*\)\]\]></table_prefix>|\1|p' ${MAGENTO_PATH}/app/etc/local.xml | tr -d ' ')

function search_db() {
	COUNT=$(mysql -u ${DB_USER} -p${DB_PASS} ${DB_NAME} --execute="SELECT count(*) FROM ${DB_PREFIX}catalog_product_entity_media_gallery WHERE value = \"$1\"")
	echo $(echo ${COUNT} | cut -d" " -f2)
}

echo "Starting image cleanup " $(date) | tee -a ${LOG}
IMG_PATH=${MAGENTO_PATH}/media/catalog/product/
for IMG in $(find ${IMG_PATH} -name '*.jpg' ! -path '*cache*' ! -name 'google*'); do
	REL_IMG=/${IMG:${#IMG_PATH}}
	if [ $(search_db ${REL_IMG/'${MAGENTO_PATH}/media/catalog/product'/}) != 1 ]; then
		IMG=${IMG##*/}
		for CACHE_IMG in $(find ${MAGENTO_PATH}/media/catalog/product/ -name "${IMG}"); do
			echo "Found unused image ${CACHE_IMG}"
			if [ "$1" ] && [ $1 == 'cleanup' ]; then
				echo "Removing unused image ${CACHE_IMG}" | tee -a ${LOG}
				rm "${CACHE_IMG}"
			fi
		done
	else 
		echo "Not touching " ${IMG}
	fi
done
echo "Finished image cleanup " $(date) | tee -a ${LOG}

How can I use this?
1. Download the script here. Extract it and put it somewhere in your Magento installation. For example in the shell directory.
2. Edit the script and change the MAGENTO_PATH to the absolute path (starting at /, but not ending with a /) of your Magento installation. If you do not know where that is, leave a comment, or bang your head against a wall, whichever suits you best/
3. Make the file executable. You can do this using your (S)FTP software or on the command line. For example using “chmod +rx imagecleanup.sh”. Again leave if a comment if your having trouble doing that.
4. Run it in test modus (without parameters). It won’t delete anyhing! But if you see does not want to touch some files, but remove others you know its working. If you are happy go to the next step.
4. Run it on the shell with the paramter cleanup. So ‘./imagecleanup.sh cleanup’. Or create a cronjob in your hosting panel calling this script once a week.

Disclaimer
If you lose images you still needed you’re on your own, so test things out (as described) somewhere and always make sure you have a backup. As with all our free code, we are not responsible for the havoc it may cause. It’s tested for Magento 1.4+ in several real life environments. We however are not obligated by any way to provide support if thing go wrong (we however, always do, if you ask us nice… or if you buy us candy), nor are we responsible for anything, in any way ;-).

Like this free extension?
Let us know, and like our Facebook page! Want more? Leave a comment expressing your wishes in regard to this extension, and we will keep you in mind working on updates.

24 thoughts on “Remove unused Magento product images

    • Hello Dario,

      We build Magento servers for a living ;-). You need a VPS with decent I/O performance and enough memory and we can make it fly like that.

      Cheerz

  1. Hello Pieter,

    thanks a lot for your script. I ran it with a little modification because our database lives on another host:

    […]
    DB_HOST=$(sed -n ‘s||\1|p’ ${MAGENTO_PATH}/app/etc/local.xml | tr -d ‘ ‘)
    […]
    COUNT=$(mysql -h ${DB_HOST} -u ${DB_USER} -p${DB_PASS} ${DB_NAME} –execute=”SELECT count(*) FROM ${DB_PREFIX}catalog_product_entity_media_gallery WHERE value = \”$1\””)
    […]

    However, your script saved me a lot of work.
    Thanks for sharing it.

    Ansgar

  2. Hi,

    the shell script is nice, but I prefer to modify the Image Clean Module to add an observer and schedule it via cron. If anyone’s interested I can post the code.

    Regards, Tom

  3. This script is working great. but its kinda slow, and for huge catalog its definitely taking huge amount of time. I have an idea which i think would make it faster. I noticed the script is searching the database for every single item it finds in product image folder, thats a time and resource consuming process.

    Can we not just create an array object and push all the image urls/names in that array and use grep to find the image names from that array rather than sending a sql command?

    Not sure if it’s really gonna make some improvement in terms of time taking but m sure it could save alot of time.

  4. done banging my head… not helping

    How do I get the absolute path of the installation? I have no idea, a newest newbie here

    some say: write “”

    write it where??

    • No don’t use ~. This will only work if the Magento/webserver root is your homedirectory. Which it is in most (shared) hosting accounts, but not all.

      If you contact us on Facebook we can help you out.

    • No it will process all stores. It does support multi-store environments however, you just cannot select a specific store to clean.

  5. Thanks for this script my media folder went from 9GB to around 2.9GB.

    I changed the find command to also delete png and gif images because we only had png images in our media folder.

    find ${IMG_PATH} \( -name ‘*.jpg’ -o -name ‘*.png’ -o -name ‘*.gif’ \) ! -path ‘*cache*’ ! -name ‘google*’

    • Another problem is the test to check if the image is in use fails when multiple products use the same image. The select count(*) returns a number larger then 1 and the check at line 20 thinks the image should be deleted.

      The fixed line 20 is this instead of !=1 I used == 0

      if [ “$(search_db ${REL_IMG/’${MAGENTO_PATH}/media/catalog/product’/})” == “0” ]; then

  6. I had to remove a lot of images and the current script is very slow. I created a php alternative which i think is much faster. Sorry no file logging. You can redirect the echo to file. So mainly it works the same way as the script from Pieter. Pieter thanks for the setup!

    getStore()->setId(Mage_Core_Model_App::ADMIN_STORE_ID);

    $media = Mage::getBaseDir(‘media’).’/catalog/product';

    echo “Query database for images …\n”;
    $query = “SELECT value FROM catalog_product_entity_media_gallery”;
    $data = Mage::getSingleton(‘core/resource’)->getConnection(‘core_read’)->fetchAll($query);

    $dbData=array();
    foreach($data as $item){
    $dbData[$item[‘value’]]=$item[‘value’];
    }
    echo “Images found in database:”.count($dbData).”\n”;

    echo “Search images in media directory …\n”;
    $images = findFiles($media, array(‘jpg’));
    echo “Images found under directory ($media):”.count($images[‘jpg’]).”\n”;

    echo “Start removing images …\n”;
    $removedCount=0;
    $skippedCount=0;
    foreach($images[‘jpg’] as $image) {
    if(strpos($image,’cache’)!==false)
    {
    //echo “Skip cached image : $image\n”;
    continue;
    }

    $imageCleanup = str_replace($media,”,$image);
    if(isset($dbData[$imageCleanup]))
    {
    echo “Skip image is in database : $image\n”;
    $skippedCount++;
    continue;
    }
    else
    {
    echo “Remove image : $image\n”;
    if($testrun==false) unlink($image);
    $removedCount++;
    }
    }

    echo “Done, removed $removedCount images and skipped $skippedCount images.\n”;

    function findFiles($directory, $extensions = array()) {
    function glob_recursive($directory, &$directories = array()) {
    foreach(glob($directory, GLOB_ONLYDIR | GLOB_NOSORT) as $folder) {
    $directories[] = $folder;
    glob_recursive(“{$folder}/*”, $directories);
    }
    }
    glob_recursive($directory, $directories);
    $files = array ();
    foreach($directories as $directory) {
    foreach($extensions as $extension) {
    foreach(glob(“{$directory}/*.{$extension}”) as $file) {
    $files[$extension][] = $file;
    }
    }
    }
    return $files;
    }

    ?>

  7. Hi Pieter,

    You did Great! I modified your script to work for a remote database hostname.

    MAGENTO_PATH=”/var/www/path/to/vhost/httpdocs/path/to/magento”
    LOG=${MAGENTO_PATH}/var/log/imagecleanup.log
    DB_USER=$(sed -n ‘s||\1|p’ ${MAGENTO_PATH}/app/etc/local.xml | tr -d ‘ ‘)
    DB_PASS=$(sed -n ‘s||\1|p’ ${MAGENTO_PATH}/app/etc/local.xml | tr -d ‘ ‘)
    DB_NAME=$(sed -n ‘s||\1|p’ ${MAGENTO_PATH}/app/etc/local.xml | tr -d ‘ ‘)
    DB_PREFIX=$(sed -n ‘s||\1|p’ ${MAGENTO_PATH}/app/etc/local.xml | tr -d ‘ ‘)
    DB_REMOTEHOST=$(sed -n ‘s||\1|p’ ${MAGENTO_PATH}/app/etc/local.xml | tr -d ‘ ‘)

    function search_db() {
    COUNT=$(mysql -u ${DB_USER} -p${DB_PASS} -h ${DB_REMOTEHOST} ${DB_NAME} –execute=”SELECT count(*) FROM ${DB_PREFIX}catalog_product_entity_media_gallery WHERE $
    echo $(echo ${COUNT} | cut -d” ” -f2)
    }

    echo “Starting image cleanup ” $(date) | tee -a ${LOG}
    IMG_PATH=${MAGENTO_PATH}/media/catalog/product/
    for IMG in $(find ${IMG_PATH} -name ‘*.jpg’ ! -path ‘*cache*’ ! -name ‘google*’); do
    REL_IMG=/${IMG:${#IMG_PATH}}
    if [ $(search_db ${REL_IMG/’${MAGENTO_PATH}/media/catalog/product’/}) != 1 ]; then
    IMG=${IMG##*/}
    for CACHE_IMG in $(find ${MAGENTO_PATH}/media/catalog/product/ -name “${IMG}”); do
    echo “Found unused image ${CACHE_IMG}”
    if [ “$1″ ] && [ $1 == ‘cleanup’ ]; then
    echo “Removing unused image ${CACHE_IMG}” | tee -a ${LOG}
    rm “${CACHE_IMG}”
    fi
    done
    else
    echo “Not touching ” ${IMG}
    fi
    done
    echo “Finished image cleanup ” $(date) | tee -a ${LOG}

    exit 0;

    Regards,
    Lui =]

  8. Great work guys, lovely little script to have ticking away in the background on a cron job once a month and tidy up the loose ends.

    We also use a Magento Extension from Fishpig called smushit, to – a lot of our images, google it, it’s worth a look.

    All in it’s a great script and we just needed to add in the remote database host fix as documented above, although I had already done it by the time I spotted the comments.. :-) DOH!

    Cheers guys thanks alot, will drop a link to this on my blog as I think it’s defiantly worth people knowing about.

  9. Hi Pieter,

    I tried to use your script however, I got 2 errors:

    1) line 30: unexpected EOF while looking for matching `”‘
    2) line 33: syntax error: unexpected end of file

    Can you please help me? Thanks

Leave a Reply

Your email address will not be published. Required fields are marked *


*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>