Posts: 75
Joined: 21.Sep.2004
From: Denver, CO
Status: offline
I was looking for a quick/easy way to load URL's from the urlblacklist.com and couldn't find anything that loaded the list fairly quickly. I have written a script which will create an XML file which loads fairly fast. Just create the files listed below in the extracted directory of the urlblacklist (this is the directory that has the categories in it). Then adjust the includes.txt to include the categories you want, run BuildXML and you will have a blacklist.xml file that can be imported directly into a URL set in ISA. I hope people find this useful. The script is a memory hog but it is much faster than using vbscript to talk to the ISA firewall objects. Plus, this can be run "offline" and then loaded when convenient.
The script is listed below and there are 3 files it needs in the directory to run. They are:
includes.txt - Categories of the urlblacklist to include LineTemplate.txt - Format of XML line for a URL to be included template.xml - Base format of the XML file to create with a special line called REPLACEME which inserts the LineTemplate for each URL to be added
I've pasted the code and file contents here:
BuildXML.vbs - script to create XML file for ISA 2004/2006 ' Set up key objects Set WShell = CreateObject("WScript.Shell") Set fs = CreateObject("Scripting.FileSystemObject") ' Based on the bigblacklist URL files, build XML file ' using categories included in includes.txt to build ' an XML file using the template.xml input ' ' List all the input and output files TemplateXML = "template.xml" CatIncludes = "includes.txt" LineTemplate = "linetemplate.txt" OutPut = "blacklist.xml" set cats=fs.OpenTextFile(CatIncludes) set outXML=fs.CreateTextFile(output,True) ' ' Create the recordset for in-memory sorting Const adVarChar = 200 Const MaxCharacters = 40 set rs=WScript.CreateObject("ADODB.Recordset") rs.Fields.Append "Field1", adVarChar, MaxCharacters rs.Open Do While Not Cats.AtEndOfStream CatLine = cats.ReadLine If Len(CatLIne) > 0 Then if fs.fileexists(CatLine & "\domains") Then set domain=fs.OpenTextFile(CatLine & "\domains") Do While Not domain.AtEndOfStream str_domain = domain.ReadLine str_count=UBound(Split(str_domain,".")) str_subsite=InStr(1,str_domain,"/") if str_count < 3 And len(str_domain) < MaxCharacters And str_subsite = 0 then rs.AddNew rs("Field1") = str_domain rs.Update end if Loop domain.close set domain=nothing end if if fs.fileexists(CatLine & "\urls") Then set url=fs.OpenTextFile(CatLine & "\urls") Do While Not url.AtEndOfStream str_url = url.ReadLine str_count=UBound(Split(str_url,".")) str_subsite=InStr(1,str_url,"/") if str_count < 3 and len(str_url) < MaxCharacters And str_subsite = 0 then rs.AddNew rs("Field1") = str_url rs.Update end if Loop url.close set url=nothing end if End If Loop rs.Sort="Field1" rs.MoveFirst ' ' Write out the XML inserting the values from the table ' created to replace the {URL} in the line template Set fl = fs.OpenTextFile(LineTemplate) set ft = fs.OpenTextFile(TemplateXML) str_line = fl.ReadLine Do While Not ft.AtEndOfStream NewLine = ft.ReadLine If InStr(NewLine,"REPLACEME") > 0 Then oldval="" Do Until rs.EOF str_prefix="http://*." if mid(rs.Fields.Item("Field1"),1,1)="." then str_prefix="http://*" if rs.Fields.Item("Field1") <> oldval then _ outXML.WriteLine Replace(str_line, "{URL}", "http://*." & rs.Fields.Item("Field1")) oldval = rs.Fields.Item("Field1") rs.MoveNext Loop Else outXML.WriteLine NewLine End If Loop
My contents of the include.txt: adult audio-video beerliquorinfo beerliquorsale desktopsillies dialers drugs gambling hacking naturism onlinegames phishing porn proxy sexuality spyware violence virusinfected warez weapons
Dear Rob, I followed your instructions step-by-step, but I can't import blacklist.xml (46.7 MB) in ISA2006 URL Sets. It shows me error message "Invalid xml declaration". Is there any thing wrong I did, please help me to get rid of this.
Posts: 75
Joined: 21.Sep.2004
From: Denver, CO
Status: offline
I developed the list for ISA 2004 so it may not work with ISA 2006. It would be due to the template.xml and linetemplate.txt files. To create a new template.xml and linetemplate.txt files follow these simple instructions:
1. Create a new URL set in ISA 2006 called Test URL. Add a single URL to this set using something like http://test 2. Export the URL set from ISA 2006 to an XML file called template.xml 3. Replace the line with http://test in it with the single word REPLACEME. Be sure to copy the line to the clipboard or someplace as it will be needed in step #4. 4. Edit the linetemplate.txt file and replace all the text in the file with the line from #3. Do not add a carriage return at the end of this file. There should only be the single line. Replace the URL http://test with {URL}. Save this as a replacement of the linetemplate.txt file.
These simple steps will work with future versions of ISA as well to allow the template.xml and linetemplate.txt files to remain current. After that the script should work fine.
Dear Rob thanks for your help, it works successfully. I have one question once I imported the xml file it has created a new URL Set "Test URL". Is it fine? Because I was expecting that it will create different URL sets for each catagory. Another thing this URL set looks that only contains porn and sex related domain, not other catagories. In my includes.txt I have same catagories list which you mentioned here. Is there any thing wrong? Need your advice.
Posts: 75
Joined: 21.Sep.2004
From: Denver, CO
Status: offline
The includes.txt file is what controls the categories in the single URL set. There is only a single URL set created by the export. You could modify the includes.txt file and make multiple runs renaming the URL sets imported but the script will not create individual type URL sets.
Fantastic script and help, but im still lagging behind, I seem to be getting an Line 58,Char1 error"either BOF or EOF is True or the current record has been deleted", when I run build.vbs could anyone give me a clue how to fix it?
Posts: 75
Joined: 21.Sep.2004
From: Denver, CO
Status: offline
That suggests your result set is empty. That can only happen if you are running the script from the wrong location (see instructions in my post about where the scripts MUST be located relative to the blacklist extracted files) or your include list is empty.
Be sure you are running the script from the right location...all paths are relative and if it can't find any matches the result set will not have any domains in it...causing an error when you try to move to the first record.
Posts: 75
Joined: 21.Sep.2004
From: Denver, CO
Status: offline
You are using the blacklist from urlblacklist.com, right? The script assumes you are using the blacklist download from urlblacklist.com and have extracted it to the directory. The extracted zip has subdirectories in each of the categories. The script is not designed to load your own list or a different one unless it follows the directory structure exactly as the urlblacklist.com one does. Make sense?
Posts: 75
Joined: 21.Sep.2004
From: Denver, CO
Status: offline
Then it should work. You could try putting in a line above the rs.movefirst as "msgbox rs.recordcount". This will tell you how many records you have in your blacklist. If it is -1, then there was an error elsewhere in the script. If it is 0, that is your issue.
Posts: 75
Joined: 21.Sep.2004
From: Denver, CO
Status: offline
The 0 means that the script is not finding any of the directories, or doesn't find any matching what is in the includes.txt. The script expects each category to be a subdirectory and within that subdirectory there should be a domains file and a urls file. These files are not being found by the script. Fix that and all will work well.
Many thanks, After you said that it couldnt see the files, i went back and checked within the includes.txt and turns out that there were spaces on the end of every word which prevented the vbs seeing the domain files.
Hi, I now test the ISA 2006 with URLBlacklist and your script work very well. However, I still have 2 questions:
1) Why other script use both Url and Domain name Set? 2) My xml blacklist is around 75Mb. When I load this blacklist, the apply button takes 5 minutes for any changes. Is it normal?
I have followed the directions and have the URL list but when I try to import it into ISA Server 2004 it gives me a "file format is not valid" error. Any ideas what I can do?