Unstructured Street Data in New York

7 min readJun 7, 2024

Milvus, Vector Databaase, Deep Learning, AI, Gen AI, Vision, Street Cameras, Open Data, Vision Processing, Ultralytics, Slack, Pytorch

Since I am in New York preparing for some incredible meetups and can’t wait to interact with everyone I wanted to start building some New York City use cases for unstructured data processing and vector databases.

You may have seen my previous article looking at this data and we used Ultralytics YoLo v8 to find cool stuff in them and post to Slack.

Streaming Street Cams to YOLO v8 With Python and NiFi to MinIO/S3

Apache NiFi, Python, YoLoV8, MinIO, S3, Images, Cameras, New York City

medium.com

I have found I can do a lot with Python, Milvus and some killer libraries. So I rebuilt this and it’s cool. I am using Python 3.11 and 3.12 on Mac OSX M1 and Ubuntu 20 on AMD. I am using the latest Milvus and Pymilvus SDK. I am running ATTU on Docker for management.

For this project, we will need Python 3.11+ and these libraries.

LIBRARIES

ultralytics
slack_sdk
pymilvus
scikit-learn
timm
torch
numpy
Torch Vision
pillow

This demo uses some great portions of this great example, so if you tried that one before you will see a lot of things in familiar:

Image Search with Milvus | Milvus Documentation

image search with Milvus

milvus.io

I am going to build a simple unstructured data processing pipeline with Python and show you what I am doing step by step. Below is the full code if you want to jump ahead.

Step 1: Load List of Cameras from 511NY REST JSON End point

response = requests.get(url).content

Step 2: Load JSON

json_object = json.loads(response)

Step 3: YOLO v8 Model

yolomodel = YOLO(‘yolov8n.pt’)

Step 4: Connect to Slack

client = WebClient(token=slack_token)

Step 5: Connect to Milvus

milvus_client = MilvusClient( uri=MILVUS_URL )

Step 6: Build a Schema

schema = CollectionSchema(fields=fields)

Step 7: Create a Collection

milvus_client.create_collection(COLLECTION_NAME, DIMENSION, schema=schema, metric_type=”COSINE”, auto_id=True)

Step 8: Create an index

index_params = milvus_client.prepare_index_params()
index_params.add_index(field_name = “vector”, metric_type=”COSINE”)
milvus_client.create_index(COLLECTION_NAME, index_params)

Step 9: Iterate through JSON and Write fields and images

for jsonitems in json_object:
 videourl = jsonitems['VideoUrl']
 img = requests.get(url)
 if img.status_code == 200:
            with open(filepath, 'wb') as f:
                f.write(img.content)

Step 10: Run YOLO v8 Prediction on Image

results = yolomodel.predict(filepath, stream=False, save=True, imgsz=640, conf=0.5)

Step 11: Iterate through YOLO results

for result in results:
            outputimage = result.path
            savedir = result.save_dir
            speed = result.speed
            names = result.names
            boxes = result.boxes  # Boxes object for bounding box outputs
            masks = result.masks  # Masks object for segmentation masks outputs
            keypoints = result.keypoints  # Keypoints object for pose outputs
            probs = result.probs  # Probs object for classification outputs
            obb = result.obb  # Oriented boxes object for OBB outputs
            resultfilename = "camimages/{0}.png".format(uuid.uuid4())
            result.save(filename=resultfilename)  # save to disk
            strText = ":tada:" + str(strname) + ":" + str(roadwayname)

Step 12: Post and Upload to Slack

try:
    response = client.files_upload_v2(
        channel="C06NE1FU6SE",
        file=resultfilename,
        title=roadwayname,
        initial_comment="Transformed image " + str(strfilename),
    )
except SlackApiError as e:
      assert e.response["error"]

Finally, insert into Milvus.

try:
      imageembedding = extractor(resultfilename)
      milvus_client.insert( COLLECTION_NAME, {"vector": imageembedding, 
"filepath": filepath, "url": url, "videourl": videourl, "latlong": latlong, 
"name": strname, "roadwayname": roadwayname,"directionoftravel": 
directionoftravel, "videourl": videourl})

      print("resultfilename:" + resultfilename)
      print("Milvus:sent collection:" + roadwayname)  except Exception as e:
      print("An error:", e)

Now we can run the application and start processing street camera images.

SLACK_BOT_TOKEN="tokenFindYourown" NYURL="https://511ny.org/api/getcameras?key=getakey&format=json" python streetcams.py

Let’s search against our new camera images.

AIM-NYCStreetCams/streetcamsearch.ipynb at main · tspannhw/AIM-NYCStreetCams

Python, Slack, Milvus, Ultralytics, Street Cameras, JSON — AIM-NYCStreetCams/streetcamsearch.ipynb at main ·…

github.com

Showing Collection in ATTU

GitHub — zilliztech/attu: The GUI for Milvus

The GUI for Milvus. Contribute to zilliztech/attu development by creating an account on GitHub.

github.com

OUTPUT IN SLACK

DEMO

REAL WORLD

Tech Week — Soft Meetup Debut — June 2024

Milvus, Unstructured Data, NYC Meetup, Tech Week, Gen AI, Big Data

medium.com

AI Camp NYC Report — February 2024

22-Feb-2024: Microsoft Times Square. Room: Central Park West 6501. AI Camp NYC.

medium.com

Not only is our data in the real world, so are we. Come to our beautiful new location with our friendly hosts. I swear we are not in the Matrix.

UPCOMING MEETUPS

Upcoming Zilliz Unstructured Data Meetups

Join Zilliz and other AI industry experts to learn, share and discuss unstructured data in LLMs at monthly meetups in…

zilliz.com

Unstructured Data Meetup New York | Meetup

This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector…

www.meetup.com

Unstructured Data Meetup · Events Calendar

View and subscribe to events from Unstructured Data Meetup on Luma. meetups for people working in unstructured data…

lu.ma

June 18, 2024 — Princeton

GenAI Gathering | Startup Grind

In-person Event — “Just as electricity transformed almost everything 100 years ago, today I actually have a hard time…

www.startupgrind.com

June 20, 2024 — Times Square at Microsoft Reactor with AI Camp NYC

AI Meetup (NYC): GenAI, LLMs, ML and Data

AICamp: Learn and practice AI/ML from anywhere any time with webinars, workshops and courses.

www.aicamp.ai

CODE

GitHub — tspannhw/AIM-NYCStreetCams: Python, Slack, Milvus, Ultralytics, Street Cameras, JSON

Python, Slack, Milvus, Ultralytics, Street Cameras, JSON — tspannhw/AIM-NYCStreetCams

github.com

NEXT STEPS

If you have seen previous meetups, talks, articles or demos you know that I have been acquiring transit data. My next step is to grab transit, weather and other public data sources for New York City and enhance them with images and image processing. With our images vectorized, we can find nearest matches with example images!?!?!! We will also have correlated data for latitude, longitude, speeds, temperatures and lots of other attributes. We will also see how to add JSON and Array fields to Milvus. We will also use edge NVIDIA devices to capture, vectorize and run various models right on the device.

Iteration 1: Building a System to Consume All the Real-Time Transit Data in the World At Once

Source Code: https://github.com/tspannhw/FLaNK-EveryTransitSystem

medium.com

Iteration 2: Building a System to Consume All the (Unsecured) Real-Time Transit Data in the World

This is the remix.

medium.com

Kafka for Edge AI on Jetson Nano: Enabling Efficient Data Streaming

NVIDIA Jetson Nano, Data Streaming, Apache Kafka, Python, Java

medium.com

Next up I will be building a notebook with some extra comments and display.

NEW YORK CITY DATA

Subways and Transit Updates in Real-Time

Apache NiFi, Apache Kafka, Apache Flink, JavaScript, Python, GTFS, Postgresql, SQL

medium.com

NYC Traffic!?!??! Are You Kidding Me?

Apache NiFi, Python, Traffic, JSON, Web Camera, REST, XML, RSS, JSON

medium.com

The unstructured database for New York City and beyond

RESOURCES

ultralytics/ultralytics/data at main · ultralytics/ultralytics

NEW — YOLOv8 🚀 in PyTorch > ONNX > OpenVINO > CoreML > TFLite — ultralytics/ultralytics/data at main ·…

github.com

Transit in Sao Paulo, Brasil — FLaNK Style

Streaming with NiFi, Kafka, Flink

medium.com

Use JSON Fields | Milvus Documentation

This guide explains how to use the JSON fields, such as inserting JSON values as well as searching and querying in JSON…

milvus.io

JSON and Metadata Filtering in Milvus — Zilliz blog

A brief review of how to ingest your data with JSON in your Milvus vector database

zilliz.com

Use Array Fields | Milvus Documentation

This guide explains how to use the array fields, such as inserting array values as well as searching and querying in…

milvus.io

Manage Schema | Milvus Documentation

Learn how to define a schema in Milvus.

milvus.io

Unstructured Data Engineering for AI

Timothy Spann’s articles, videos, images, examples, documentation and source code on Data — unstuctured and structured…

www.youtube.com

search() — pymilvus v2.4.x for Milvus

Edit description

milvus.io

Join the Milvus Discord Server!

Check out the Milvus community on Discord — hang out with 1689 other members and enjoy free voice and text chat.

discord.com

Unstructured Street Data in New York

Streaming Street Cams to YOLO v8 With Python and NiFi to MinIO/S3

Apache NiFi, Python, YoLoV8, MinIO, S3, Images, Cameras, New York City

LIBRARIES

Image Search with Milvus | Milvus Documentation

image search with Milvus

AIM-NYCStreetCams/streetcamsearch.ipynb at main · tspannhw/AIM-NYCStreetCams

Python, Slack, Milvus, Ultralytics, Street Cameras, JSON — AIM-NYCStreetCams/streetcamsearch.ipynb at main ·…

Showing Collection in ATTU

GitHub — zilliztech/attu: The GUI for Milvus

The GUI for Milvus. Contribute to zilliztech/attu development by creating an account on GitHub.

OUTPUT IN SLACK

DEMO

REAL WORLD

Tech Week — Soft Meetup Debut — June 2024

Milvus, Unstructured Data, NYC Meetup, Tech Week, Gen AI, Big Data

AI Camp NYC Report — February 2024

22-Feb-2024: Microsoft Times Square. Room: Central Park West 6501. AI Camp NYC.

UPCOMING MEETUPS

Upcoming Zilliz Unstructured Data Meetups

Join Zilliz and other AI industry experts to learn, share and discuss unstructured data in LLMs at monthly meetups in…

Unstructured Data Meetup New York | Meetup

This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector…

Unstructured Data Meetup · Events Calendar

View and subscribe to events from Unstructured Data Meetup on Luma. meetups for people working in unstructured data…

GenAI Gathering | Startup Grind

In-person Event — “Just as electricity transformed almost everything 100 years ago, today I actually have a hard time…

AI Meetup (NYC): GenAI, LLMs, ML and Data

AICamp: Learn and practice AI/ML from anywhere any time with webinars, workshops and courses.

CODE

GitHub — tspannhw/AIM-NYCStreetCams: Python, Slack, Milvus, Ultralytics, Street Cameras, JSON

Python, Slack, Milvus, Ultralytics, Street Cameras, JSON — tspannhw/AIM-NYCStreetCams

NEXT STEPS

Iteration 1: Building a System to Consume All the Real-Time Transit Data in the World At Once

Source Code: https://github.com/tspannhw/FLaNK-EveryTransitSystem

Iteration 2: Building a System to Consume All the (Unsecured) Real-Time Transit Data in the World

This is the remix.

Kafka for Edge AI on Jetson Nano: Enabling Efficient Data Streaming

NVIDIA Jetson Nano, Data Streaming, Apache Kafka, Python, Java

NEW YORK CITY DATA

Subways and Transit Updates in Real-Time

Apache NiFi, Apache Kafka, Apache Flink, JavaScript, Python, GTFS, Postgresql, SQL

NYC Traffic!?!??! Are You Kidding Me?

Apache NiFi, Python, Traffic, JSON, Web Camera, REST, XML, RSS, JSON

RESOURCES

ultralytics/ultralytics/data at main · ultralytics/ultralytics

NEW — YOLOv8 🚀 in PyTorch > ONNX > OpenVINO > CoreML > TFLite — ultralytics/ultralytics/data at main ·…

Transit in Sao Paulo, Brasil — FLaNK Style

Streaming with NiFi, Kafka, Flink

Use JSON Fields | Milvus Documentation

This guide explains how to use the JSON fields, such as inserting JSON values as well as searching and querying in JSON…

JSON and Metadata Filtering in Milvus — Zilliz blog

A brief review of how to ingest your data with JSON in your Milvus vector database

Use Array Fields | Milvus Documentation

This guide explains how to use the array fields, such as inserting array values as well as searching and querying in…

Manage Schema | Milvus Documentation

Learn how to define a schema in Milvus.

Unstructured Data Engineering for AI

Timothy Spann’s articles, videos, images, examples, documentation and source code on Data — unstuctured and structured…

search() — pymilvus v2.4.x for Milvus

Edit description

Join the Milvus Discord Server!

Check out the Milvus community on Discord — hang out with 1689 other members and enjoy free voice and text chat.

Written by Tim Spann